Lelik13a July 20, 2015 at 13:33

Pacemaker-based HA-Cluster for LXC and Docker container virtualization

From the sandbox

In this article, I will describe the installation and configuration of an Active / Active cluster based on Pacemaker, Corosync 2.x and CLVM using shared storage. Let me show you how to adapt this cluster to work with LXC and Docker containers. I will describe the commands for working with the cluster. And I recall the rake that I got into, which, I hope, will facilitate the fate of the following crooks.

As server distributions I will use CentOS 7 + epel and current versions of packages in them. The main tool for working with Pacemaker will be PCS (pacemaker / corosync configuration system).

Server Preparation

I used a configuration of two nodes, but their number can be increased as needed. Servers have shared, shared storage connected via SAS. If this is not at hand, then you can use the storage connected FC or iSCSI. It will take two volumes, one for general needs, the other for Docker. You can split one volume into two sections.
Install CentOS 7, epel repository and configure the network. Using bonding for network interfaces and multipath for SAS are desirable. To work with various vlan we configure the corresponding br0.VID bridges, to which we will then bind the LXC container. I will not describe the details - everything is standard.

For work of LXC and Docker it is necessary to disconnect regular firewald.

# systemctl stop firewalld.service
# systemctl disable firewalld.service
# setenforce Permissive

We also put selinux in permissive mode , to facilitate setup and bold experiments. Later, when we debug, switch back, maybe.

Immediately add the necessary addresses and names to / etc / hosts on all nodes:

#nodes, vlan 10
10.1.0.1      cluster-1
10.1.0.2      cluster-2
#nodes ipmi, vlan 314
10.1.15.1      ipmi-1
10.1.15.2      ipmi-2
#docker, vlan 12
10.1.2.10       docker
10.1.2.11       dregistry

To work, you will need the STONITH mechanism (“Shoot The Other Node In The Head”), for which we use ipmi. We configure using ipmitool :

# ipmitool shell
impitool> user set name 2 admin
impitool> user set password 2 'очень секретный пароль'
 #  < privilege level> )
impitool> user priv 2 4 1

We get the admin user (id = 2) and give him administrator rights (livel = 4) on the channel associated with the network interface (channel = 1).

It is desirable to move the network for ipmi to a separate vlan, firstly it will allow isolating it, secondly there will be no connectivity problems if IPMI BMC (baseboard management controller) shares the network interface with the server.

impitool> lan set 1 ipsrc static
impitool> lan set 1 ipaddr 10.1.15.1
impitool> lan set 1 netmask 255.255.255.0
impitool> lan set 1 defgw ipaddr 10.1.15.254
impitool> lan set 1 vlan id 314
# настройка доступа:
impitool> lan set 1 access on
impitool> lan set 1 auth ADMIN MD5
ipmitool> channel setaccess 1 2 callin=on ipmi=on link=on privilege=4

On other nodes, similarly, only the IPs are different.

Check connectivity like this:

# ipmitool -I lan -U admin -P 'очень секретный пароль' -H 10.1.15.1 bmc info

Installation and basic setup of Pacemaker and CLVM

If you do not know what Pacemaker is, then it is advisable to read about it first. Good and in Russian about Pacemaker is written here .
On all nodes, install packages from the epel repository:

# yum install pacemaker pcs resource-agents fence-agents-all

On all nodes, set the password for the hacluster cluster administrator. PCS works under this user , and a web-based management interface is also available.

echo CHANGEME | passwd --stdin hacluster

Further operations are performed on one node.
Configure authentication:

# pcs cluster auth cluster-1 cluster-2 -u hacluster -p CHANGEME --force

We create and launch the Cluster cluster of two nodes:

# pcs cluster setup --force --name Cluster cluster-1 cluster-2
# pcs cluster start --all

We look at the result:

# pcs cluster status
Cluster Status:
 Last updated: Wed Jul  8 14:16:32 2015
 Last change: Wed Jul  8 10:01:20 2015
 Stack: corosync
 Current DC: cluster-1 (1) - partition with quorum
 Version: 1.1.12-a14efad
 2 Nodes configured
 17 Resources configured (всё ещё впереди)
PCSD Status:
  cluster-1: Online
  cluster-2: Online

There is one caveat: if https_proxy is specified in environment variables, then pcs can lie about the status of nodes, apparently trying to use a proxy.

We start and register the pcsd daemon in startup :

# systemctl start pcsd
# systemctl enable pcsd

After that, the web-based interface for managing the cluster is available at the "https://ip_узла:2224"

interface. The interface allows you to view the cluster status, add or change its parameters. A trifle, but nice.

Since we have only two nodes, we will not have a quorum, therefore we need to disable this policy:

# pcs property set no-quorum-policy=ignore

To auto-start cluster nodes, just add pacemaker to startup :

# systemctl enable pacemaker

CLVM and GFS2 require DLM (Distributed Lock Manager) to work. Both CLVM and DLM in RHEL7 (CentOS 7) as stand-alone daemons are absent and are cluster resources. At the same time, DLON requires STONITH to work, otherwise the corresponding cluster resource will not start. Customize:

# pcs property set stonith-enabled=true
# pcs stonith create cluster-1.stonith fence_ipmilan ipaddr="ipmi-1" passwd="пароль на ipmi" login="admin" action="reboot" method="cycle" pcmk_host_list=cluster-1 pcmk_host_check=static-list stonith-timeout=10s op monitor interval=10s
# pcs stonith create cluster-2.stonith fence_ipmilan ipaddr="ipmi-2" passwd="пароль на ipmi" login="admin" action="reboot" method="cycle" pcmk_host_list=cluster-2 pcmk_host_check=static-list stonith-timeout=10s op monitor interval=10s
# pcs constraint location cluster-1.stonith avoids cluster-1=INFINITY
# pcs constraint location cluster-2.stonith avoids cluster-2=INFINITY

Why so is well described here . In short, we start two stonith resources, each responsible for its own node and forbid them to work on the node that they must shoot.

Set up additional global parameters.
For experiments, it is useful to configure resource migration after the first failure:

# pcs resource defaults migration-threshold=1

So that the resource that migrated to another node as a result of the failure doesn’t go back, after restoring the node, set:

#  pcs resource defaults resource-stickiness=100

Where "100" is a certain weight on the basis of which Pacemaker calculates the behavior of the resource.

So that in the midst of bold experiments, the nodes do not shoot each other, I recommend that you explicitly set a policy for resource failure:

# pcs resource op defaults on-fail=restart

Otherwise, at the most interesting place, stonith will work, which by default will cause the stop command to fail.

Install CLVM on each node:

# yum install lvm2 lvm2-cluster

We configure LVM to work in a cluster on each node:

# lvmconf --enable-cluster

We get dlm and clvmd resources in a cluster:

# pcs resource create dlm ocf:pacemaker:controld op monitor interval=30s on-fail=fence clone interleave=true ordered=true
# pcs resource create clvmd ocf:heartbeat:clvm op monitor interval=30s on-fail=fence clone interleave=true ordered=true

These are critical resources for our cluster, therefore, for a failure situation, we explicitly set the stonith ( on-fail = fence ) policy . The resource must be running on all nodes in the cluster, because he declared cloned ( the clone ). We start resources in turn, but not in parallel ( ordered = true ). If the resource depends on other clone resources, then we do not wait for all resource samples to run on all nodes, but are content with local ones ( interleave = true ). Pay attention to the last two parameters, they can significantly affect the operation of the cluster as a whole, and we will still have clone resources.

We set the launch order of resources in which clvmd is launched only after dlm. The commands already use the service name * -clone , which indicates a resource on a particular node:

# pcs constraint order start dlm-clone then clvmd-clone
# pcs constraint colocation add clvmd-clone with dlm-clone

We also oblige to run the clvmd-clone resource together with dlm-clone on the same node. In our case of two nodes, this seems unnecessary, but in the general case * * -clone of resources may be less than the number of nodes and then the joint arrangement becomes critical.

Immediately upon creation of resources, they are launched on all nodes, and if everything is fine, then we can begin to create shared logical volumes and file systems. clvmd will monitor the integrity of the metadata and notify all nodes of the changes, so we perform the operation on one node.
We initialize the sections for use in LVM:

# pvcreate /dev/mapper/mpatha1
# pvcreate /dev/mapper/mpatha2

In general, working with cluster LVM is almost no different from working with regular LVM, with the only difference being that if a volume group (VG) is marked as clustered, then clvmd controls its metadata. Create volume groups:

# vgcreate --clustered y shared_vg /dev/mapper/mpatha1
# vgcreate --clustered y shared_vg-ex /dev/mapper/mpatha2

The general configuration of the cluster is completed, then we will fill it with resources. A resource from the point of view of Pacemaker is any service, process, the IP address of which can be controlled by scripts. Resource scripts themselves are similar to init scripts and also perform a set of functions start, stop, monitor, etc. The basic principle that we will follow is to put the data necessary for a resource running on one node into the shared partition of the shared_vg group and any file system as desired; we put the data that is needed on both nodes at the same time on GFS2. In the first case, Pacemaker will monitor the integrity of the data, which controls the number and location of running resources, including the file systems used. In the second case, the internal mechanisms of GFS2. The shared_vg-ex group will be completely given under the logical partition for Docker. The thing is, that Docker creates a thin provisioned volume that can only be active in exclusive mode on a single node. And placing this volume in a separate group is convenient for further work and configuration.

Work with LXC in a cluster

We will work using the lxc- * utilities, which are included in the lxc package . We put:

# yum install lxc lxc-templates

We set default parameters for future containers:

# cat /etc/lxc/default.conf
lxc.start.auto  = 0
lxc.network.type = veth
lxc.network.link = br0.10
lxc.network.flags = up
# memory and swap
lxc.cgroup.memory.limit_in_bytes = 256M
lxc.cgroup.memory.memsw.limit_in_bytes = 256M

The type of network will be veth - eth0 is inside the container; outside the container, it will be attached to the br0.10 bridge . Of the limitations we use only from memory, we will indicate them. If desired, you can register any supported by the kernel, according to the principle lxc.cgroup.state-object-name = value . They can also be changed on the fly using lxc-cgroup . On the file system, these options are presented along the way /sys/fs/cgroup/TYPE/lxc/CT-NAME/object-name. An important point regarding restrictions: the memory.limit_in_bytes parameter must be specified before memory.memsw.limit_in_bytes . And also, the second parameter is the sum of memory and swap, and must be greater than or equal to the first. Otherwise, the machine will start without memory restrictions.
The placement, start and stop of containers will be controlled by Pacemaker, so we turn off container startup automatically.

Each LXC container will live in its logical volume of the shared_vg group. Set the default VG name:

# cat /etc/lxc/lxc.conf
lxc.bdev.lvm.vg = shared_vg

Such placement will allow launching the container on any node of the cluster. Container configuration files should also be shared, so create a common file system and configure its use on all nodes:

# lvcreate -L 500M -n lxc_ct shared_vg
# mkfs.gfs2 -p lock_dlm -j 2 -t Cluster:lxc_ct /dev/shared_vg/lxc_ct

We select the lock_dlm lock protocol , since the storage is shared. We start two logs, according to the number of nodes ( -j2 ). Set up the name in the lock table, where Cluster is the name of our cluster.

# pcs resource create fs-lxc_ct Filesystem fstype=gfs2 device=/dev/shared_vg/lxc_ct directory=/var/lib/lxc clone ordered=true interleave=true
# pcs constraint order start clvmd-clone then fs-lxc_ct-clone

We start the next clone-resource, such as Filesystem , the device and directory fields are required and describe what to mount and where. And we indicate the launch order of the resource, since without clvmd the file system is not mounted. After that, on all nodes the directory where the LXC stores container settings is mounted.

Create the first container:

# lxc-create -n lxc-racktables -t oracle -B lvm --fssize 2G --fstype ext4 --vgname shared_vg -- -R 6.6

Here lxc-racktables is the name of the container, oracle is the template used. -B determines the type of storage used and options. lxc-create will create an LVM partition and deploy the base system there, according to the template. After "-" the parameters of the template are indicated, in my case - the version.
At the time of writing, the template from the centos package on lvm did not work, but oracle also worked for me.
If you need to deploy a system based on deb packages, you must first install the debootstrap utility . A prepared system is first deployed to/var/cache/lxc/and at each subsequent launch, lxc-create updates the system packages to the current versions. It’s convenient for yourself to assemble your own template, with all the necessary presets. The standard templates are here: /usr/share/lxc/templates.
You can also use the special " download " template , which downloads already prepared system archives from the repository.

The container is ready. You can manage containers with lxc- * utilities. Run in the background, look at its state, stop:

# lxc-start -n lxc-racktables -d
# lxc-info -n  lxc-racktables
Name:           lxc-racktables
State:          RUNNING
PID:            9364
CPU use:        0.04 seconds
BlkIO use:      0 bytes
Memory use:     1.19 MiB
KMem use:       0 bytes
Link:           vethS7U8J1
 TX bytes:      90 bytes
 RX bytes:      90 bytes
 Total bytes:   180 bytes
# lxc-stop -n lxc-racktables

We configure additional container settings either in its console using lxc-console , or by mounting the container section lvm somewhere.

Now you can give control to Pacemaker. But first, take a fresh resource management file from GitHub :

# wget -O /usr/lib/ocf/resource.d/heartbeat/lxc https://raw.githubusercontent.com/ClusterLabs/resource-agents/master/heartbeat/lxc
# chmod +x /usr/lib/ocf/resource.d/heartbeat/lxc

The directory /usr/lib/ocf/resource.d/contains resource management files in the provider / type hierarchy. You can view the entire list of resources with the team pcs resource list. View a description of a specific resource .pcs resource describe

Example:

 # pcs resource describe ocf:heartbeat:lxc
ocf:heartbeat:lxc - Manages LXC containers
 Allows LXC containers to be managed by the cluster. If the container is running "init" it will also perform an orderly shutdown. It is 'assumed' that the 'init' system will do
 an orderly shudown if presented with a 'kill -PWR' signal. On a 'sysvinit' this would require the container to have an inittab file containing "p0::powerfail:/sbin/init 0" I
 have absolutly no idea how this is done with 'upstart' or 'systemd', YMMV if your container is using one of them.
Resource options:
  container (required): The unique name for this 'Container Instance' e.g. 'test1'.
  config (required): Absolute path to the file holding the specific configuration for this container e.g. '/etc/lxc/test1/config'.
  log: Absolute path to the container log file
  use_screen: Provides the option of capturing the 'root console' from the container and showing it on a separate screen. To see the screen output run 'screen -r {container
              name}' The default value is set to 'false', change to 'true' to activate this option

So, we add a new resource to our cluster:

# pcs resource create lxc-racktables lxc container=lxc-racktables config=/var/lib/lxc/lxc-racktables/config
# pcs constraint order start fs-lxc_ct-clone then lxc-racktables

And indicate the start order.
The resource will start immediately and its status can be recognized by the team pcs status. If the launch failed, then there will be a possible reason. The command will allow you to start the resource with the result displayed on the screen:pcs resource debug-start

# pcs resource debug-start lxc-racktables
Operation start for lxc-racktables (ocf:heartbeat:lxc) returned 0
 >  stderr: DEBUG: State of lxc-racktables: State:          STOPPED
 >  stderr: INFO: Starting lxc-racktables
 >  stderr: DEBUG: State of lxc-racktables: State:          RUNNING
 >  stderr: DEBUG: lxc-racktables start : 0

But you need to be careful with it, it ignores the cluster settings for resource allocation and runs it on the current node. And if the resource is already running on another node, then there may be surprises. The modifier "--full"will give out a lot of additional information.
Although it manages the Pacemaker container, you can still work with it with all lxc- * utilities, of course, only on the node on which it is currently working and with an eye on Pacemaker.

The resulting container resource can be transferred to another node by doing:

# pcs resource move  [destination node]

In this case, the container will correctly shut down on one node, and start on another.

Unfortunately, the LXC does not have a decent live migration tool, but when it does, you can configure migration as well. To do this, you will need to create another common partition on GFS2, where dumps will be placed and modify the lxc resource script so that it fulfills the functions migrate_to and migrate_from. I watched the CRIU
project , but failed to get work on CentOS 7.

Transfer OpenVZ container to LXC

We create a new logical partition and transfer data from the OpenVZ container (turned off) there:

# lvcreate -L 2G -n lxc-openvz shared_vg
# mkfs.ext4 /dev/shared_vg/lxc-openvz
# mount /dev/shared_vg/lxc-openvz /mnt/lxc-openvz
# rsync -avh --numeric-ids -e 'ssh' openvz:/vz/private// /mnt/lxc-openvz/

We create a configuration file for the new container, copying and changing the contents of lxc-racktables:

# mkdir /var/lib/lxc/lxc-openvz
# cp /var/lib/lxc/lxc-racktables/config /var/lib/lxc/lxc-openvz/

In the configuration file, you need to change the fields:

lxc.rootfs = /dev/shared_vg/lxc-openvz
lxc.utsname = openvz
#lxc.network.hwaddr

If necessary, configure the restrictions and the desired network bridge. Also in the container, you need to change the network settings by overwriting them on the eth0 interface and correct the file etc/sysconfig/network.

In principle, after this, the container can already be launched, but for better compatibility with LXC, the contents need to be finalized. As an example, I used a template to create a centos ( /usr/share/lxc/templates/lxc-centos) container , namely the contents of functions configure_centosand configure_centos_initwith a little refinement. Pay special attention to the creation of the script etc/init/power-status-changed.conf, without it, the container will not be able to shut down correctly. Or the inittab of the container should contain a rule of the form: "p0::powerfail:/sbin/init 0"(depends on the distribution).

/etc/init/power-status-changed.conf

# power-status-changed - shutdown on SIGPWR
#
start on power-status-changed

exec / sbin / shutdown -h now “SIGPWR received”

If someone is too lazy to understand it yourself (but in vain), then you can use mine (at your own peril and risk). The MAC address of the container is best fixed in the configuration file.
Ported containers may have problems with the console - it cannot be obtained by funds lxc-console. With a script, I solve this problem using agetty(alternative Linux getty), which was part of a portable container. And with the settings initthat starts the processes of the form:

/sbin/agetty -8 38400 /dev/console
/sbin/agetty -8 38400 /dev/tty1

The recipe and scripts /etc/init/were borrowed from the created clean container and redone for agetty.

/etc/init/start-ttys.conf

#
# This service starts the configured number of gettys.
#
# Do not edit this file directly. If you want to change the behaviour,
# please create a file start-ttys.override and put your changes there.

start on stopped rc RUNLEVEL=[2345]

env ACTIVE_CONSOLES=/dev/tty[1-6]
env X_TTY=/dev/tty1
task
script
. /etc/sysconfig/init
for tty in $(echo $ACTIVE_CONSOLES); do
[ "$RUNLEVEL" = «5» -a "$tty" = "$X_TTY" ] && continue
initctl start tty TTY=$tty
done
end script

/etc/init/console.conf

# console — getty
#
# This service maintains a getty on the console from the point the system is
# started until it is shut down again.

start on stopped rc RUNLEVEL=[2345]
stop on runlevel [!2345]
env container

respawn
#exec /sbin/mingetty --nohangup --noclear /dev/console
exec /sbin/agetty -8 38400 /dev/console

/etc/init/tty.conf

# tty - getty
#
# This service maintains a getty on the specified device.
#
# Do not edit this file directly. If you want to change the behavior,
# please create a file tty.override and put your changes there.

stop on runlevel [S016]

respawn
instance $ TTY
#exec / sbin / mingetty --nohangup $ TTY
exec / sbin / agetty -8 38400 $ TTY
usage 'tty TTY = / dev / ttyX - where X is console id'

I tried using mingettyin a migrated container with CentOS 6.6, but it refused to work with an error in the logs:

# /sbin/mingetty --nohangup /dev/console
console: no controlling tty: Operation not permitted

You can work with LXC through libvrtusing the lxc: /// driver , but this method is dangerous and RedHat threatens to remove its support from the distribution.
For management through libvrtPacemaker there is a resource script ocf:heartbeat:VirtualDomainthat can control any VM, depending on the driver. Opportunities include live migration for KVM. I think using Pacemaker to control KVM will be similar, but I didn’t need it.

Working with Docker in a cluster

Setting up Pacemaker to work with Docker is similar to setting up LXC, but there are also design differences.
First, we’ll install Docker, since it is included in the RHEL / CentOS 7 distribution, there will be no problems.

# yum install docker

Teach Docker how to work with LVM. To do this, create a file /etc/sysconfig/docker-storage-setupwith the contents:

VG=shared_vg-ex

Where we indicate in which volume group Docker should create his pool. You can immediately set additional parameters ( man docker-storage-setup).

We launch docker-storage-setup:

# docker-storage-setup
  Rounding up size to full physical extent 716.00 MiB
  Logical volume "docker-poolmeta" created.
  Logical volume "docker-pool" created.
  WARNING: Converting logical volume shared_vg-ex/docker-pool and shared_vg-ex/docker-poolmeta to pools data and metadata volumes.
  THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
  Converted shared_vg-ex/docker-pool to thin pool.
  Logical volume "docker-pool" changed.
# lvs | grep docker-pool
  docker-pool     shared_vg-ex twi-aot---  17.98g             14.41  2.54

Docker uses a thin provisioned volume, this imposes a restriction on use in the cluster. Such a volume cannot be active on several nodes at the same time. We configure LVM so that volumes in the shared_vg-ex group are not automatically activated. To do this, you must explicitly specify the groups (or volumes separately) that will be automatically activated in the file /etc/lvm/lvm.conf(on all nodes):

	auto_activation_volume_list = [ "shared_vg" ]

Let's transfer control of this volume to Pacemaker:

# pcs resource create lvm-docker-pool LVM volgrpname=shared_vg-ex exclusive=yes
# pcs constraint order start clvmd-clone then lvm-docker-pool
# pcs constraint colocation add lvm-docker-pool with clvmd-clone

Now the volumes of the shared_vg-ex group will be activated on the node where the lvm-docker-pool resource is running .

Docker will use a dedicated IP for NAT containers from the external network. Let's fix it in the configuration:

# cat /etc/sysconfig/docker-network
DOCKER_NETWORK_OPTIONS="--ip=10.1.2.10 —fixed-cidr=172.17.0.0/16"

We will not configure a separate bridge, let it use docker0 by default , just fix the network for containers. I tried to tell me a convenient network for containers, but came across some unintelligible errors. Google suggested that I was not alone, so I was satisfied just fixing the network that Docker chose for himself. Docker also does not stop the bridge when it shuts down and does not remove the IP address, but since the bridge is not connected to any physical interface, this is not a problem. In other configurations, this must be taken into account.

In order for Docker to go to the Internet through a proxy server, we will configure environment variables for it. To do this, create a directory and a file /etc/systemd/system/docker.service.d/http-proxy.confwith the contents:

[Service]
Environment="http_proxy=http://ip_proxy:port" "https_proxy=http://ip_proxy:port" "NO_PROXY=localhost,127.0.0.0/8,dregistry"

The basic setup is completed, we will fill the cluster with the appropriate resources. Since a set of resources will be responsible for Docker, it is convenient to assemble them into a group. All resources of the group are launched on one node and are launched sequentially, according to the order in the group. But you need to keep in mind that if one of the group’s resources fails, the whole group is about to migrate to another node. And also, when you turn off a resource of a group, all subsequent resources will also turn off. The first resource of the group will be the volume created by LVM:

# pcs resource group add docker lvm-docker-pool

Create a resource - IP address issued by Docker:

# pcs resource create dockerIP IPaddr2 --group docker --after lvm-docker-pool  ip=10.1.2.10 cidr_netmask=24 nic=br0.12

In addition to LVM volumes, Docker also uses the file system to store its data. Therefore, you need to create another section under the control of Pacemaker. Since this data is needed only by a working Dokcer, the resource will be normal.

# lvcreate -L 500M -n docker-db shared_vg
# mkfs.xfs /dev/shared_vg/docker-db
# pcs resource create fs-docker-db  Filesystem fstype=xfs device=/dev/shared_vg/docker-db directory=/var/lib/docker --group docker --after dockerIP

Now you can add the Docker daemon itself:

# pcs resource create dockerd systemd:docker --group docker --after fs-docker-db

After successfully launching the resources of the group, on the node where Docker settled, we look at its status and make sure that everything is fine.

docker info:

# docker info
Containers: 5
Images: 42
Storage Driver: devicemapper
 Pool Name: shared_vg--ex-docker--pool
 Pool Blocksize: 524.3 kB
 Backing Filesystem: xfs
 Data file:
 Metadata file:
 Data Space Used: 2.781 GB
 Data Space Total: 19.3 GB
 Data Space Available: 16.52 GB
 Metadata Space Used: 852 kB
 Metadata Space Total: 33.55 MB
 Metadata Space Available: 32.7 MB
 Udev Sync Supported: true
 Library Version: 1.02.93-RHEL7 (2015-01-28)
Execution Driver: native-0.2
Kernel Version: 3.10.0-229.7.2.el7.x86_64
Operating System: CentOS Linux 7 (Core)
CPUs: 4
Total Memory: 3.703 GiB
Name: cluster-2

You can already work with Docker in the usual way. But for the sake of completeness, let's get and put in our Docker registry cluster. For the registry, we will use a separate IP and name 10.1.2.11 (dregistry), and the file storage of images will be put in a separate section.

# lvcreate -L 10G -n docker-registry shared_vg
# mkfs.ext4 /dev/shared_vg/docker-registry
# mkdir /mnt/docker-registry
# pcs resource create docker-registry Filesystem fstype=ext4 device=/dev/shared_vg/docker-registry directory=/mnt/docker-registry --group=docker –after=dockerd
# pcs resource create registryIP IPaddr2 --group docker --after docker-registry ip=10.1.2.11 cidr_netmask=24 nic=br0.12

Create a container container on the node where Docker is running:

# docker create -p 10.1.2.11:80:5000 -e REGISTRY_STORAGE_FILESYSTEM_ROOTDIRECTORY=/var/lib/registry -v /mnt/docker-registry:/var/lib/registry -h dregistry --name=dregistry registry:2

We set the port forwarding to the container (10.1.2.11:80 → 5000). Pluggable directory /mnt/docker-registry. Host name and container name.
The output docker ps -awill show the created container ready to run.

We give control to them Pacemaker. To get started, download a fresh resource script:

# wget -O /usr/lib/ocf/resource.d/heartbeat/docker https://raw.githubusercontent.com/ClusterLabs/resource-agents/master/heartbeat/docker
# chmod +x /usr/lib/ocf/resource.d/heartbeat/docker

It is important to monitor the identity of resource scripts on all nodes, otherwise there may be surprises. The docker resource script itself can create the required containers with the given parameters by downloading them from the registry. Therefore, you can simply use the constantly running Docker on cluster nodes with a common registry and personal repositories. And Pacemaker can only control individual containers, but this is not so interesting, and redundant. I have not yet figured out what to do with one Docker.
So, we will transfer control of the finished container to Pacemaker.

# pcs resource create dregistry docker reuse=true image="docker.io/registry:2" --group docker --after registryIP

reuse = true is an important parameter, otherwise the container will be deleted after stopping. image It is necessary to specify the full coordinates of the container, including the registry and tag. The resource script will pick up the finished container named dregistry and run it.
We will register our local registry in the Docker configuration on the cluster nodes ( /etc/sysconfig/docker).

ADD_REGISTRY='--add-registry dregistry'
INSECURE_REGISTRY='--insecure-registry dregistry'

We do not need HTTPS, so we disable it for the local registry.

After that, we restart the Docker service systemctl restart dockeron the node where it lives or pcs resource restart dockerdon any node in the cluster. And we can use the capabilities of our personal registry at 10.1.2.11 (dregistry).

Now, with the example of Docker containers, I will show how to work with templates in Pacemaker. Unfortunately, the utility features are pcsvery limited here. She does not know how to do templates at all, and for constraint it allows you to create some associations, but working with them through is pcsnot convenient. Fortunately, the ability to edit the cluster configuration directly in the xml file comes to the rescue:

# pcs cluster cib > /tmp/cluster.xml
# правим, что нужно
# pcs cluster cib-push /tmp/cluster.xml

Docker resources must meet the following requirements:

is on the same node as docker group resources
run after all docker group resources
containers should be independent of each other's status

Of course, this can be achieved by hanging dependencies on each container with pcs constraint, but the configuration is cumbersome and poorly readable.

To begin with, we will create three experimental containers, for which I used a container with Nginx. The container is pre-downloaded and uploaded to the local registry:

# docker pull nginx:latest
# docker push nginx:latest
# pcs resource create doc-test3 docker reuse=false image="dregistry/nginx:latest" --disabled
# pcs resource create doc-test2 docker reuse=true image="dregistry/nginx:latest" --disabled
# pcs resource create doc-test docker reuse=true image="dregistry/nginx:latest" --disabled

Resources are created off, otherwise they will try to start and then how lucky with the node.

In freshly added xml, we add a pool of resources. Defining co-location (section

):
Здесь заводится общее объединение (colocation set, id="docker-col") с требованием совместного проживания на одном узле (score="INFINITY"). Первое объединение ресурсов (id="docker-col-0") со свойствами:
 одновременный запуск ресурсов (sequential="false")
 не зависимость ресурсов от статуса друг друга (require-all="false")
 роль ресурсов (role="Started")


Тэг resource_ref ссылается на существующие ресурсы кластера, которые нужно включить в это объединение. Параметр role="Started" критичен.

И второе объединение ресурсов (id="docker-col-1"), куда входят все ресурсы группы docker.

Мне не совсем понятна логика работы параметра role в этой конструкции, но оно должно быть так (проверено экспериментами).

Ordering set, определяющий порядок запуска ресурсов:
Здесь прозрачнее, сначала запускаем ресурсы группы docker, затем скопом и не зависимо все контейнера.


После загрузки изменённой конфигурации, контейнера можно безбоязненно включать/выключать и удалять. Это не повлияет на работу других ресурсов. Но требование совместного расположение заставит переместится на другой узел кластера все связанные ресурсы, в случае переноса какого-либо ресурса. В выводе pcs эти настройки выглядят следующим образом:
 # pcs constraint --full | grep -i set
  Resource Sets:
    set docker (id:order_doc-0) set doc-test doc-test2 doc-test3 sequential=false require-all=false (id:order_doc-1) (id:order_doc)
  Resource Sets:
    set doc-test doc-test2 doc-test3 role=Started sequential=false require-all=false (id:docker-col-0) set docker (id:docker-col-1) setoptions score=INFINITY (id:docker-col)



Работа с шаблонами ресурсов осуществляется сходным образом. Создадим шаблон для LXC контейнера (в секции ):


В котором пропишем основные параметры ресурса. И перепишем настройки ресурса на использование этого шаблона:


После обновления конфигурации вывод свойств ресурса стал совсем короток:
# pcs resource show lxc-racktables
 Resource: lxc-racktables (template=lxc-template)
  Attributes: container=lxc-racktables config=/var/lib/lxc/lxc-racktables/config


Правда создавать новые ресурсы средствами pcs с использованием шаблонов пока не получится.

Осталось зафиксировать порядок запуска контейнеров LXC аналогичным объединением, как с Docker-ом.


Для управления Pacemaker-ом можно поставить пакет crmsh из репозитория opensuse.org, но, возможно, придётся повозится с зависимостями.


Кластер приобрёл следующий вид:
# pcs statusCluster name: Cluster

Last updated: Thu Jul 16 12:29:33 2015

Last change: Thu Jul 16 10:23:40 2015

Stack: corosync

Current DC: cluster-1 (1) - partition with quorum

Version: 1.1.12-a14efad

2 Nodes configured

19 Resources configured


Online: [ cluster-1 cluster-2 ]


Full list of resources:


cluster-1.stonith (stonith:fence_ipmilan): Started cluster-2

 cluster-2.stonith (stonith:fence_ipmilan): Started cluster-1

 Clone Set: dlm-clone [dlm]

 Started: [ cluster-1 cluster-2 ]

 Clone Set: clvmd-clone [clvmd]

 Started: [ cluster-1 cluster-2 ]

 Clone Set: fs-lxc_ct-clone [fs-lxc_ct]

 Started: [ cluster-1 cluster-2 ]

 lxc-racktables (ocf::heartbeat:lxc): Started cluster-1

 Resource Group: docker

 lvm-docker-pool (ocf::heartbeat:LVM): Started cluster-2

 dockerIP (ocf::heartbeat:IPaddr2): Started cluster-2

 fs-docker-db (ocf::heartbeat:Filesystem): Started cluster-2

 dockerd (systemd:docker): Started cluster-2

 docker-registry (ocf::heartbeat:Filesystem): Started cluster-2

 registryIP (ocf::heartbeat:IPaddr2): Started cluster-2

 dregistry (ocf::heartbeat:docker): Started cluster-2

 doc-test (ocf::heartbeat:docker): Started cluster-2

 doc-test2 (ocf::heartbeat:docker): Started cluster-2

 doc-test3 (ocf::heartbeat:docker): Stopped


PCSD Status:

 cluster-1: Online

 cluster-2: Online


Daemon Status:

 corosync: active/disabled

 pacemaker: active/enabled

 pcsd: active/enabled

# pcs constraintLocation Constraints:

 Resource: cluster-1.stonith

 Disabled on: cluster-1 (score:-INFINITY)

 Resource: cluster-2.stonith

 Disabled on: cluster-2 (score:-INFINITY)

 Resource: doc-test

 Enabled on: cluster-2 (score:INFINITY) (role: Started)

Ordering Constraints:

 start dlm-clone then start clvmd-clone (kind:Mandatory)

 start clvmd-clone then start fs-lxc_ct-clone (kind:Mandatory)

 start fs-lxc_ct-clone then start lxc-racktables (kind:Mandatory)

 start clvmd-clone then start lvm-docker-pool (kind:Mandatory)

 Resource Sets:

 set docker set doc-test doc-test2 doc-test3 sequential=false require-all=false

Colocation Constraints:

 clvmd-clone with dlm-clone (score:INFINITY)

 fs-lxc_ct-clone with clvmd-clone (score:INFINITY)

 lvm-docker-pool with clvmd-clone (score:INFINITY)

 Resource Sets:

 set doc-test doc-test2 doc-test3 role=Started sequential=false require-all=false set docker setoptions score=INFINITY



Спасибо за внимание.

Шпаргалка


Посмотреть параметры конкретного ресурса:
# pcs resource show 


Обновить или дополнить параметры ресурса, например:
# pcs resource update  op start start-delay="3s" interval=0s timeout=90
Необходимо указать весь список параметров для обновляемой функции, включая существующие, так как update их перезапишет.


Перезапустить ресурс
# pcs resource restart  [node]
Важно при перезапуске clone-ресурса обязательно указать узел явно, иначе перезапустит, где вздумается.


Посмотреть и сбросить счётчик ошибок для конкретного ресурса:
# pcs resource failcount show  [node]
# pcs resource failcount reset  [node]
Это очень полезно, если ресурс впал в состояние ошибки и Pacemaker ждёт ваших действий что-бы продолжить работу с ресурсом дальше.


Почистить ошибки и статус по ресурсу:
# pcs resource cleanup []
После после этого Pacemaker подхватывает ресурс в работу.


Выключить/включить какой-либо ресурс:
# pcs resource disable []
# pcs resource enable []
Это приведёт к остановке/запуску соответствующего процесса, службы или настройки.


Переместить ресурс на другой узел:
# pcs resource move  [destination node]
Если ресурс-скрипт не отрабатывает функции миграции, то ресурс будет штатно остановлен на своём текущем узле и запущен на целевом. Если ресурс переносили на другой узел, и запуск там не удался, то ресурс вернётся назад. При этом, в constraint добавится правило вида:
  Resource: docker
    Enabled on: cluster-2 (score:INFINITY) (role: Started) (id:cli-prefer-docker)
c адресом целевого узла. После отладки правило нужно удалять в ручную.


Посмотреть полный вывод ограничений и зависимостей:
# pcs  constraint –full


Удалить какое-либо ограничения по id, полученное выше:
# pcs constraint remove 


Сохранить и загрузить изменённую конфигурацию кластера:
# pcs cluster cib > /tmp/cluster.xml
# pcs cluster cib-push /tmp/cluster.xml


Cсылки

Pacemaker, теория
Clusters from Scratch
Configuration Explained
Pacemaker: как добить лежачего
Creating LVM Volumes in a Cluster
Linux Containers with libvirt-lxc (deprecated)
High Availability Add-On Reference
High Availability Add-On Administration
Docker Docs
Get Started with Docker Formatted Container Images on Red Hat Systems
LXC 1.0.
GitHub ClusterLabs
CRIU

Tags:

Pacemaker-based HA-Cluster for LXC and Docker container virtualization

Server Preparation

Installation and basic setup of Pacemaker and CLVM

Work with LXC in a cluster

Transfer OpenVZ container to LXC

Working with Docker in a cluster

Шпаргалка

Cсылки

Also popular now: