Active/Passive Cluster With Pacemaker, Corosync and DRBD on CentOS 7: Part 4 – Configure Fencing (STONITH)

The following is part 4 of a 4 part series that will go over an installation and configuration of Pacemaker, Corosync, Apache, DRBD and a VMware STONITH agent.

Pacemaker Fencing Agents

Fencing is a very important concept in computer clusters for High Availability. Unfortunately, given that fencing does not offer a visible service to users, it is often neglected.

There are two kinds of fencing: resource level and node level.

The node level fencing makes sure that a node does not run any resources at all. This is the fencing method that we are going to use in this article. The node will be simply rebooted using a vCentre.

Be advised that the node level fencing configuration depends heavily on environment. Commonly used STONITH devices include remote management services like Dell DRAC and HP iLO (the lights-out devices), uninterrupted power supplies (UPS), blade control devices.

STONITH (Shoot The Other Node In The Head) is Pacemaker’s fencing implementation.

Fencing Agents Available

To see what packages are available for fencing, run the following command:

[pcmk01]# yum search fence-

You should get a couple of dozen agents listed.

Fencing Agent for APC

For those having an APC network power switch, the fence_apc fence agent is likely the best bet. It logs into device via telnet/ssh and reboots a specified outlet.

Although such configuration is not covered in this article, APC UPS equipment is commonly used and therefore worth mentioning.

Fencing Agent for VMware

As we run virtual machines on a VMware platform, we are going to use the vmware_fence_soap fencing device:

[pcmk01]# yum search fence-|grep -i vmware
fence-agents-vmware-soap.x86_64 : Fence agent for VMWare with SOAP API v4.1+

Be sure to install the package on all cluster nodes:

[ALL]# yum install -y fence-agents-vmware-soap

Fencing Agent Script

We usually have to find the correct STONITH agent script, however, in our care, there should be just a single fence agent available:

[pcmk01]# pcs stonith list
fence_vmware_soap - Fence agent for VMWare over SOAP API

Find the parameters associated with the device:

[pcmk01]# pcs stonith describe fence_vmware_soap

It’s always handy to have a list of mandatory parameters:

[pcmk01]# pcs stonith describe fence_vmware_soap|grep required
  port (required): Physical plug number, name of virtual machine or UUID
  ipaddr (required): IP Address or Hostname
  action (required): Fencing Action
  login (required): Login Name

We can check the device’s metadata as well:

[pcmk01]# stonith_admin -M -a fence_vmware_soap

For the configuration of this VMware fencing device we need credentials to vCentre with minimal permissions. Once we have credentials, we can get a list of servers which are available on VMware:

[pcmk01]# fence_vmware_soap --ip vcentre.local --ssl --ssl-insecure --action list \
  --username="vcentre-account" --password="passwd" | grep pcmk


  1. ip – is the IP Address or Hostname of a vCentre,
  2. username – is the vCentre username,
  3. password – is the vCentre password,
  4. action – is the fencing action to use,
  5. ssl – uses SSL connection,
  6. ssl-insecure – uses SSL connection without verifying fence device’s certificate.

The vCentre account, which we use above, has the “Power On/Power Off” role on VMware and is allowed to access all machines that we are using in the series.

Configure Fencing (STONITH)

Get a local copy of the CIB:

[pcmk01]# pcs cluster cib stonith_cfg

Create a new STONITH resource called my_vcentre-fence:

[pcmk01]# pcs -f stonith_cfg stonith create my_vcentre-fence fence_vmware_soap \
 ipaddr=vcentre.local ipport=443 ssl_insecure=1 inet4_only=1 \
 login="vcentre-account" passwd="passwd" \
 action=reboot \
 pcmk_host_map="pcmk01-cr:vm-pcmk01;pcmk02-cr:vm-pcmk02" \
 pcmk_host_check=static-list \
 pcmk_host_list="vm-pcmk01,vm-pcmk02" \
 power_wait=3 op monitor interval=60s

We use pcmk_host_map for mapping host names to ports numbers for devices that do not support host names.

The host names should be the ones used for Corosync interface! Make sure pcmk_host_map contains the names of the corosync interfaces. Otherwise if Corosync interface if down, you may get the following error:

pcmk01 stonith-ng[23308]: notice: my_vcentre-fence can not fence (reboot) pcmk02-cr: static-list
pcmk01 stonith-ng[23308]: notice: Couldn't find anyone to fence (reboot) pcmk02-cr with any device
pcmk01 stonith-ng[23308]: error: Operation reboot of pcmk02-cr by  for [email protected]: No such device
pcmk01 crmd[23312]: notice: Initiating action 47: start my_vcentre-fence_start_0 on pcmk01-cr (local)
pcmk01 crmd[23312]: notice: Stonith operation 6/50:46:0:bf22c892-cf13-44b2-8fc6-67d13c05f4d4: No such device (-19)
pcmk01 crmd[23312]: notice: Stonith operation 6 for pcmk02-cr failed (No such device): aborting transition.
pcmk01 crmd[23312]: notice: Transition aborted: Stonith failed (source=tengine_stonith_callback:695, 0)
pcmk01 crmd[23312]: notice: Peer pcmk02-cr was not terminated (reboot) by  for pcmk01-cr: No such device

Enable STONITH and commit the changes:

[pcmk01]# pcs -f stonith_cfg property set stonith-enabled=true
[pcmk01]# pcs cluster cib-push stonith_cfg

Check the STONITH status and review the cluster resources:

[pcmk01]# pcs stonith show
 my_vcentre-fence       (stonith:fence_vmware_soap):    Started pcmk01-cr
[pcmk01]# pcs status
Cluster name: test_webcluster
Last updated: Sun Dec 13 16:24:14 2015          Last change: Sun Dec 13 16:23:42 2015 by root via cibadmin on pcmk01-cr
Stack: corosync
Current DC: pcmk02-cr (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 6 resources configured

Online: [ pcmk01-cr pcmk02-cr ]

Full list of resources:

 Resource Group: my_webresource
     my_VIP     (ocf::heartbeat:IPaddr2):       Started pcmk02-cr
     my_website (ocf::heartbeat:apache):        Started pcmk02-cr
 Master/Slave Set: MyWebClone [my_webdata]
     Masters: [ pcmk02-cr ]
     Slaves: [ pcmk01-cr ]
 my_webfs       (ocf::heartbeat:Filesystem):    Started pcmk02-cr
 my_vcentre-fence       (stonith:fence_vmware_soap):    Started pcmk01-cr

PCSD Status:
  pcmk01-cr: Online
  pcmk02-cr: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

Test Fencing

Reboot the second cluster node, make sure that you use the Corosync interface for this:

[pcmk01]# stonith_admin --reboot pcmk02-cr

We can also test it by killing the Corosync interface of the pcmk01 and observing the node being fenced:

[pcmk02]# tail -f /var/log/messages
[pcmk01]# ifdown $(ip ro|grep "172\.16\.21"|awk '{print $3}')


Leave a Reply

Your email address will not be published. Required fields are marked *