Silvar October 31, 2013 at 15:42

pacemaker: how to finish lying

Tutorial

When reserving certain types of resources, it is very important that no more than one client uses the resource at the same time, as, for example, with drbd: you cannot allow drbd to be mounted in RW mode on two systems. The same goes for disk systems that connect to multiple servers.

This is followed by the pacemaker itself, but there may be situations when the pacemaker decides that the resource needs to be transferred, but cannot give the shutdown command to another node (for example, loss of network connectivity when using iscsi via a separate network, etc.) . To combat this, use stonith (Shoot The Other Node In The Head) . In pacemaker, it is configured as a resource and is able to solve many problems.

The initial configuration will be simple:

node1.eth, node2.eth - addresses of the nodes on which the cluster is built
node1.ipmi, node2.ipmi - IP addresses of IPMI host interfaces
FS - a resource for which high availability is required

At the first step, in order to avoid problems, we dump the current configuration

pcs cluster cib stonith.xml

On the cluster, stonith must be active, and quorum is disabled (because the cluster is two nodes) . Make sure of this

#pcs -f stonith.xml property show
...
 no-quorum-policy: ignore
 stonith-enabled: true
...

If this is not so, then

pcs -f stonith.xml property set stonith-enabled=true
pcs -f stonith.xml property set no-quorum-policy=ignore

Then we create ipmi-stonith resources (a complete list of possible stonith resources will be issued pcs stonith list, and a complete list of parameters is available atpcs stonith describe )
pcs -f stonith.xml stonith create node1.stonith fence_ipmilan ipaddr="node1.ipmi" passwd="xXx" login="xXx" action="reboot" method="cycle" pcmk_host_list="node1.eth" pcmk_host_check=static-list stonith-timeout=10s op monitor interval=10s pcs -f stonith.xml stonith create node2.stonith fence_ipmilan ipaddr="node2.ipmi" passwd="xXx" login="xXx" action="reboot" method="cycle" pcmk_host_list="node2.eth" pcmk_host_check=static-list stonith-timeout=10s op monitor interval=10s

Particular attention should be paid to two parameters: ipaddrand pcmk_host_list. The first one tells which IPMI interface is located at which address, and the second - which nodes can be finished with the created resource.

Since stonith, from the point of view of pacemaker, is a regular resource, it can migrate, like all other resources. It will be very unpleasant if the process responsible for rebooting node2 is on node2. Therefore, we prohibit stonith resources from getting to the nodes that they will overload.
pcs -f stonith.xml constraint location node1.stonith avoids node1.eth=INFINITY pcs -f stonith.xml constraint location node2.stonith avoids node2.eth=INFINITY

Setup is complete. Gulf configuration in pacemaker
pcs cluster push cib stonith.xml

After that a simple check
stonith_admin -t 20 --reboot node1.eth

will make it clear that everything turned out right.

The final configuration should look something like this

# pcs status Online: [ node1.eth node2.eth ] Full list of resources: FS (ocf::heartbeat:Filesystem): Started node2.eth node1.stonith (stonith:fence_ipmilan): Started node2.eth node2.stonith (stonith:fence_ipmilan): Started node1.eth # pcs constraint location Location Constraints: Resource: node1.stonith Disabled on: node1.eth Resource: node2.stonith Disabled on: node2.eth # pcs property show no-quorum-policy: ignore stonith-enabled: true

Tags:

pacemaker: how to finish lying

Also popular now: