Pacemaker + DRBD (Dual primary) + ctdb cluster storage
Good day, habrovchane. The task was received - to deploy a fault-tolerant High Available storage using pacamaker + drbd (in dual primary mode) + clvmd + ctdb, which will be mounted on the server. I will make a reservation that I come across all these tools for the first time and will be happy with criticism and additions / corrections. Online instructions specifically for this bundle or not, or the information is outdated. This is a working one at the moment, but there is one problem whose solution I hope to find soon. All actions must be performed on both nodes, unless otherwise indicated.
Let's get started We have two virtual machines on CentOS 7.
1) For reliability, we introduce them to / etc / hosts
2) There is no DRBD in the standard repositories, so you need to connect a third-party.
3) Install drbd version 8.4 (I didn’t manage to start 9.0 in dual primary mode)
4) Activate and enable drbd kernel module in autoload
5) Create drbd resource configuration file /etc/drbd.d/r0.res
6) Turn off the drbd unit (the pacemaker will answer for it later), create metadata for the drbd disk, raise the resource
7) At the first node we make the resource primary
8) Put the pacemaker
9) Set a password for the hacluster user for authorization on the nodes
10) Run the pacemaker on both nodes.
11) Log in to the cluster. From this stage we do everything on one node.
12) Create a cluster named samba_cluster
13) activate the nodes
14) Since we use virtual machines as servers, we disable the STONITH mechanism
15) Create a VIP
16) Create drbd resource
17) Install the necessary clvm packages and prepare clvm
18) Add the dlm and clvd resource in pacemaker
19) At this stage, running clvmd and dlm should generate an error. Go to the web interface pacemaker 192.168.0.1 : 2224. If the cluster does not appear, then add it to “Edd existing”. Next, go to Resources - dlm - optional arguments and set the value of allow_stonith_disabled = true
20) Set the resource load queue
21) Forbid LVM from writing the cache and clearing it. On both nodes
22) Edit /etc/lvm/lvm.conf so that lvm does not see / dev / sdb. On both nodes
23) Create a CLVM partition. We do it only on one node
24) Mark up partition in gfs2
25) Next we add the mounting of this section in the pacemaker and tell it to boot after clvmd
26) Now it’s time ctdb, which will run samba
27) Edit the config /etc/ctdb/ctdbd.conf
28) Create a file with a list of nodes. ATTENTION! After each ip in the list of nodes, there must be a newline. Otherwise, the node will fail at initialization.
29) Add to /etc/samba/smb.conf configuration
30) Finally, we create the ctdb resource and indicate that it should load after
And now about the problem that I have not yet decided. If the node is rebooted, the whole bundle collapses, since drbd takes time to activate / dev / drbd0. DLM does not see the partition, because it is not yet activated and does not start, etc. Workaround - activate partition manually and restart pacemaker resources.
Let's get started We have two virtual machines on CentOS 7.
1) For reliability, we introduce them to / etc / hosts
192.168.0.1 node1
192.168.0.2 node2
2) There is no DRBD in the standard repositories, so you need to connect a third-party.
rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
rpm -Uvh https://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm
3) Install drbd version 8.4 (I didn’t manage to start 9.0 in dual primary mode)
yum install -y kmod-drbd84 drbd84-utils
4) Activate and enable drbd kernel module in autoload
modprobe drbd
echo drbd > /etc/modules-load.d/drbd.conf
5) Create drbd resource configuration file /etc/drbd.d/r0.res
resource r0 {
protocol C;
device /dev/drbd0;
meta-disk internal;
disk /dev/sdb;
net {
allow-two-primaries;
}
disk {
fencing resource-and-stonith;
}
handlers {
fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
}
on node1 {
address 192.168.0.1:7788;
}
on node2 {
address 192.168.0.2:7788;
}
6) Turn off the drbd unit (the pacemaker will answer for it later), create metadata for the drbd disk, raise the resource
systemctl disable drbd
drbdadm create-md r0
drbdadm up r0
7) At the first node we make the resource primary
drbdadm primary --force r0
8) Put the pacemaker
yum install -y pacemaker pcs resource-agents
9) Set a password for the hacluster user for authorization on the nodes
echo CHANGEME | passwd --stdin hacluster
10) Run the pacemaker on both nodes.
systemctl enable pcsd
systemctl start pcsd
11) Log in to the cluster. From this stage we do everything on one node.
pcs cluster auth node1 node2 -u hacluster
12) Create a cluster named samba_cluster
pcs cluster setup --force --name samba_cluster node1 node2
13) activate the nodes
pcs cluster enable --all
pcs cluster start --all
14) Since we use virtual machines as servers, we disable the STONITH mechanism
pcs property set stonith-enabled=false
pcs property set no-quorum-policy=ignore
15) Create a VIP
pcs resource create virtual_ip ocf:heartbeat:IPaddr2 ip=192.168.0.10 cidr_netmask=24 op monitor interval=60s
16) Create drbd resource
pcs cluster cib drbd_cfg
pcs -f drbd_cfg resource create DRBD ocf:linbit:drbd drbd_resource=r0 op monitor interval=60s
pcs -f drbd_cfg resource master DRBDClone DRBD master-max=2 master-node-max=1 clone-node-max=1
clone-max=2 notify=true interleave=true
pcs cluster cib-push drbd_cfg
17) Install the necessary clvm packages and prepare clvm
yum install -y lvm2-cluster gfs2-utils
/sbin/lvmconf --enable-cluster
18) Add the dlm and clvd resource in pacemaker
pcs resource create dlm ocf:pacemaker:controld op monitor interval=30s on-fail=fence clone interleave=true ordered=true
pcs resource create clvmd ocf:heartbeat:clvm op monitor interval=30s on-fail=fence clone interleave=true ordered=true
pcs constraint colocation add clvmd-clone with dlm-clone
19) At this stage, running clvmd and dlm should generate an error. Go to the web interface pacemaker 192.168.0.1 : 2224. If the cluster does not appear, then add it to “Edd existing”. Next, go to Resources - dlm - optional arguments and set the value of allow_stonith_disabled = true
20) Set the resource load queue
pcs constraint order start DRBDClone then dlm-clone
pcs constraint order start dlm-clone then clvmd-clone
21) Forbid LVM from writing the cache and clearing it. On both nodes
sed -i 's/write_cache_state = 1/write_cache_state = 0/' /etc/lvm/lvm.conf
rm /etc/lvm/cache/*
22) Edit /etc/lvm/lvm.conf so that lvm does not see / dev / sdb. On both nodes
# This configuration option has an automatic default value.# filter = [ "a|.*/|" ]
filter = [ "r|^/dev/sdb$|" ]
23) Create a CLVM partition. We do it only on one node
$ vgcreate -Ay -cy cl_vg /dev/drbd0
Physical volume "/dev/drbd0" successfully created.
Clustered volume group "cl_vg" successfully created
$ lvcreate -l100%FREE -n r0 cl_vg
Logical volume "r0" created.
24) Mark up partition in gfs2
mkfs.gfs2 -j2 -p lock_dlm -t drbd-gfs2:r0 /dev/cl_vg/r0
25) Next we add the mounting of this section in the pacemaker and tell it to boot after clvmd
pcs resource create fs ocf:heartbeat:Filesystem device="/dev/cl_vg/r0" directory="/mnt/" fstype="gfs2" --clone
pcs constraint order start clvmd-clone then fs-clone
26) Now it’s time ctdb, which will run samba
yum install -y samba ctdb cifs-utils
27) Edit the config /etc/ctdb/ctdbd.conf
CTDB_RECOVERY_LOCK="/mnt/ctdb/.ctdb.lock"
CTDB_NODES=/etc/ctdb/nodes
CTDB_MANAGES_SAMBA=yes
CTDB_LOGGING=file:/var/log/ctdb.log
CTDB_DEBUGLEVEL=NOTICE
28) Create a file with a list of nodes. ATTENTION! After each ip in the list of nodes, there must be a newline. Otherwise, the node will fail at initialization.
cat /etc/ctdb/nodes
192.168.0.1
192.168.0.2
29) Add to /etc/samba/smb.conf configuration
[global]
clustering = yes
private dir = /mnt/ctdb
lock directory = /mnt/ctdb
idmap backend = tdb2
passdb backend = tdbsam
[test]
comment = Cluster Share
path = /mnt
browseable = yes
writable = yes
30) Finally, we create the ctdb resource and indicate that it should load after
pcs constraint order start fs-clone then samba
And now about the problem that I have not yet decided. If the node is rebooted, the whole bundle collapses, since drbd takes time to activate / dev / drbd0. DLM does not see the partition, because it is not yet activated and does not start, etc. Workaround - activate partition manually and restart pacemaker resources.
vgchage -a y
pcs resource refresh