Active/Passive Cluster With Pacemaker, Corosync and DRBD on CentOS 7: Part 3 – Replicate Storage with DRBD

The following is part 3 of a 4 part series that will go over an installation and configuration of Pacemaker, Corosync, Apache, DRBD and a VMware STONITH agent.

Before We Begin

We are going to use DRBD to store web content for Apache. You may want to be advised that this is one of those situations when DRBD is likely not the best choice for meeting storage needs (see here for more info: https://fghaas.wordpress.com/2007/06/26/when-not-to-use-drbd/). Awesome rsync would do just fine. In production, you’d want to use DRBD for backend store rather than frontend.

The convention followed in the series is that [ALL] # denotes a command that needs to be run on all cluster machines.

Replicate Cluster Storage Using DRBD

It is recommended, though not strictly required, that you run your DRBD replication over a dedicated connection.

It is generally not recommended to run DRBD replication via routers, for reasons of fairly obvious performance drawbacks (adversely affecting both throughput and latency).

We use a dedicated vlan for DRBD in this article.

DRBD Installation

Import the ELRepo package signing key, enable the repository and install the DRBD kernel module with utilities:

[ALL]# rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
[ALL]# rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm
[ALL]# yum install -y kmod-drbd84 drbd84-utils

To avoid issues with SELinux, for the time being, we are going to exempt DRBD processes from SELinux control:

[ALL]# semanage permissive -a drbd_t

LVM Volume for DRBD

Create a new 256MB logical volume for DRBD:

[ALL]# vgcreate vg_drbd /dev/sdb
[ALL]# lvcreate --name lv_drbd --size 256M vg_drbd

DRBD Features: Single-primary and Dual-primary modes

In single-primary mode, a resource is, at any given time, in the primary role on only one cluster member. Since it is guaranteed that only one cluster node manipulates the data at any moment, this mode can be used with any conventional file system (ext4, XFS).

Deploying DRBD in single-primary mode is the canonical approach for high availability (failover capable) clusters. This is the mode that we are going to use in our failover cluster.

In dual-primary mode, a resource is, at any given time, in the primary role on both cluster nodes. Since concurrent access to the data is thus possible, this mode requires the use of a shared cluster file system that utilizes a distributed lock manager. Examples include GFS and OCFS2.

Deploying DRBD in dual-primary mode is the preferred approach for load-balancing clusters which require concurrent data access from two nodes. This mode is disabled by default, and must be enabled explicitly in DRBD’s configuration file.

DRBD Replication Modes

DRBD supports three distinct replication modes, allowing three degrees of replication synchronicity.

  1. Protocol A. Asynchronous replication protocol. Local write operations on the primary node are considered completed as soon as the local disk write has finished, and the replication packet has been placed in the local TCP send buffer.
  2. Protocol B. Memory synchronous (semi-synchronous) replication protocol. Local write operations on the primary node are considered completed as soon as the local disk write has occurred, and the replication packet has reached the peer node.
  3. Protocol C. Synchronous replication protocol. Local write operations on the primary node are considered completed only after both the local and the remote disk write have been confirmed.

The most commonly used replication protocol in DRBD setups is protocol C.

Configure DRBD

Configure DRBD, use single-primary mode with replication protocol C.

[ALL]# cat << EOL > /etc/drbd.d/webdata.res
resource webdata {
 protocol C;
 meta-disk internal;
 device /dev/drbd0;
 disk   /dev/vg_drbd/lv_drbd;
 handlers {
  split-brain "/usr/lib/drbd/notify-split-brain.sh root";
 }
 net {
  allow-two-primaries no;
  after-sb-0pri discard-zero-changes;
  after-sb-1pri discard-secondary;
  after-sb-2pri disconnect;
  rr-conflict disconnect;
 }
 disk {
  on-io-error detach;
 }
 syncer {
  verify-alg sha1;
 }
 on pcmk01 {
  address  172.16.22.11:7789;
 }
 on pcmk02 {
  address  172.16.22.12:7789;
 }
}
EOL

We have a resource named webdata which uses /dev/vg_drbd/lv_drbd as the lower-level device, and is configured with internal meta data.

The resource uses TCP port 7789 for its network connections, and binds to the IP addresses 172.16.22.11 and 172.16.22.12, respectively.

If case we run into problems, we have to ensure that a TCP 7789 port is open on a firewall for the DRBD interface and that the resource name matches the file name.

Create the local metadata for the DRBD resource:

[ALL]# drbdadm create-md webdata

Ensure the DRBD kernel module is loaded:

[ALL]# lsmod|grep drbd
drbd                  392583  0
libcrc32c              12644  1 drbd

Finally, bring up the DRBD resource:

[ALL]# drbdadm up webdata

For data consistency, tell DRBD which node should be considered to have the correct data (can be run on any node as both have garbage at this point):

[pcmk01]# drbdadm primary --force webdata

It should now sync:

[pcmk01]# watch -n.5 'cat /proc/drbd'

Create a filesystem on the DRBD device, tune if required:

[pcmk01]# mkfs.ext4 -m 0 -L drbd /dev/drbd0
[pcmk01]# tune2fs -c 200 -i 180d /dev/drbd0

Populate DRBD Content

Mount the newly created disk and populate it with a web document:

[pcmk01]# mount /dev/drbd0 /mnt
[pcmk01]# cat << EOL >/mnt/index.html
DRBD backend test
EOL

We need to give the same SELinux policy as the web document root. Display security context:

[pcmk01]# ls -ldZ /var/www/html/
drwxr-xr-x. root root system_u:object_r:httpd_sys_content_t:s0 /var/www/html/

The httpd policy stores data with multiple different file context types under the /var/www directory. If we want to store the data in a different directory, we can use the semanage command to create an equivalence mapping.

[pcmk01]# semanage fcontext --add --equal /var/www /mnt
[pcmk01]# restorecon -R -v /mnt

Please be advised that changes made with the chcon command do not survive a file system relabel, or the execution of the restorecon command. Always use semanage.

[pcmk01]# umount /dev/drbd0

Cluster Configuration for the DRBD Device

Create a cluster resource named my_webdata for the DRBD device, and an additional clone resource MyWebClone to allow the resource to run on both nodes at the same time:

[pcmk01]# pcs cluster cib drbd_cfg
[pcmk01]# pcs -f drbd_cfg resource create my_webdata ocf:linbit:drbd \
  drbd_resource=webdata op monitor interval=10s
[pcmk01]# pcs -f drbd_cfg resource master MyWebClone my_webdata \
  master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 \
  notify=true

Verify resources and commit:

[pcmk01]# pcs -f drbd_cfg resource show
 Resource Group: my_webresource
     my_VIP     (ocf::heartbeat:IPaddr2):       Started pcmk01-cr
     my_website (ocf::heartbeat:apache):        Started pcmk01-cr
 Master/Slave Set: MyWebClone [my_webdata]
     Stopped: [ pcmk01-cr pcmk02-cr ]
[pcmk01]# pcs cluster cib-push drbd_cfg

Check the cluster status:

[pcmk01]# pcs status
Cluster name: test_webcluster
Last updated: Sun Dec 13 15:16:31 2015          Last change: Sun Dec 13 15:16:21 2015 by root via cibadmin on pcmk01-cr
Stack: corosync
Current DC: pcmk02-cr (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 4 resources configured

Online: [ pcmk01-cr pcmk02-cr ]

Full list of resources:

 Resource Group: my_webresource
     my_VIP     (ocf::heartbeat:IPaddr2):       Started pcmk01-cr
     my_website (ocf::heartbeat:apache):        Started pcmk01-cr
 Master/Slave Set: MyWebClone [my_webdata]
     Masters: [ pcmk01-cr ]
     Slaves: [ pcmk02-cr ]

PCSD Status:
  pcmk01-cr: Online
  pcmk02-cr: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

Filesystem Configuration for the Cluster

We have a working DRBD device, now we need to mount its filesystem.

Create a cluster resource named my_webfs for the filesystem:

[pcmk01]# pcs cluster cib fs_cfg
[pcmk01]# pcs -f fs_cfg resource create my_webfs Filesystem \
  device="/dev/drbd0" directory="/var/www/html" fstype="ext4"

Filesystem resource needs to be run on the same node as the MyWebClone resource. Since one cluster service depends one another cluster service running on the same node, we need to assign an infinity score to the constraint:

[pcmk01]# pcs -f fs_cfg constraint colocation add my_webfs with MyWebClone \
  INFINITY with-rsc-role=Master
[pcmk01]# pcs -f fs_cfg constraint order promote MyWebClone then start my_webfs
Adding MyWebClone my_webfs (kind: Mandatory) (Options: first-action=promote then-action=start)

Tell the cluster that the virtual IP needs to run on the same machine as the filesystem and that it must be active before the VIP can start:

[pcmk01 ~]# pcs -f fs_cfg constraint colocation add my_VIP with my_webfs INFINITY
[pcmk01 ~]# pcs -f fs_cfg constraint order my_webfs then my_VIP
Adding my_webfs my_VIP (kind: Mandatory) (Options: first-action=start then-action=start)

This way Apache is only started when the filesystem and the VIP are both available.

Verify the updated configuration:

[pcmk01]# pcs -f fs_cfg constraint
Location Constraints:
Ordering Constraints:
  promote MyWebClone then start my_webfs (kind:Mandatory)
  start my_webfs then start my_VIP (kind:Mandatory)
Colocation Constraints:
  my_webfs with MyWebClone (score:INFINITY) (with-rsc-role:Master)
  my_VIP with my_webfs (score:INFINITY)
[pcmk01]# pcs -f fs_cfg resource show
 Resource Group: my_webresource
     my_VIP     (ocf::heartbeat:IPaddr2):       Started pcmk01-cr
     my_website (ocf::heartbeat:apache):        Started pcmk01-cr
 Master/Slave Set: MyWebClone [my_webdata]
     Masters: [ pcmk01-cr ]
     Slaves: [ pcmk02-cr ]
 my_webfs       (ocf::heartbeat:Filesystem):    Stopped

Commit the changes and check the cluster status:

[pcmk01]# pcs cluster cib-push fs_cfg
[pcmk01]# pcs status
Cluster name: test_webcluster
Last updated: Sun Dec 13 15:19:01 2015          Last change: Sun Dec 13 15:18:55 2015 by root via cibadmin on pcmk01-cr
Stack: corosync
Current DC: pcmk02-cr (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 5 resources configured

Online: [ pcmk01-cr pcmk02-cr ]

Full list of resources:

 Resource Group: my_webresource
     my_VIP     (ocf::heartbeat:IPaddr2):       Started pcmk01-cr
     my_website (ocf::heartbeat:apache):        Started pcmk01-cr
 Master/Slave Set: MyWebClone [my_webdata]
     Masters: [ pcmk01-cr ]
     Slaves: [ pcmk02-cr ]
 my_webfs       (ocf::heartbeat:Filesystem):    Started pcmk01-cr

PCSD Status:
  pcmk01-cr: Online
  pcmk02-cr: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

Here’s a list of cluster resources:

# pcs resource show --full
 Group: my_webresource
  Resource: my_VIP (class=ocf provider=heartbeat type=IPaddr2)
   Attributes: ip=10.247.50.213 cidr_netmask=32
   Operations: start interval=0s timeout=20s (my_VIP-start-interval-0s)
               stop interval=0s timeout=20s (my_VIP-stop-interval-0s)
               monitor interval=10s (my_VIP-monitor-interval-10s)
  Resource: my_website (class=ocf provider=heartbeat type=apache)
   Attributes: configfile=/etc/httpd/conf/httpd.conf statusurl=http://localhost/server-status
   Operations: start interval=0s timeout=40s (my_website-start-interval-0s)
               stop interval=0s timeout=60s (my_website-stop-interval-0s)
               monitor interval=10s (my_website-monitor-interval-10s)
 Master: MyWebClone
  Meta Attrs: master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
  Resource: my_webdata (class=ocf provider=linbit type=drbd)
   Attributes: drbd_resource=webdata
   Operations: start interval=0s timeout=240 (my_webdata-start-interval-0s)
               promote interval=0s timeout=90 (my_webdata-promote-interval-0s)
               demote interval=0s timeout=90 (my_webdata-demote-interval-0s)
               stop interval=0s timeout=100 (my_webdata-stop-interval-0s)
               monitor interval=10s (my_webdata-monitor-interval-10s)
 Resource: my_webfs (class=ocf provider=heartbeat type=Filesystem)
  Attributes: device=/dev/drbd0 directory=/var/www/html fstype=ext4
  Operations: start interval=0s timeout=60 (my_webfs-start-interval-0s)
              stop interval=0s timeout=60 (my_webfs-stop-interval-0s)
              monitor interval=20 timeout=40 (my_webfs-monitor-interval-20)

References

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Clusters_from_Scratch/index.html
https://drbd.linbit.com/users-guide/

6 thoughts on “Active/Passive Cluster With Pacemaker, Corosync and DRBD on CentOS 7: Part 3 – Replicate Storage with DRBD

  1. Hi,

    First, thank you for your great article. I have a question, what is happen if primary node (pcmk01) crash. Is there any way to create another primary node? or our cluster will be ruined.

    Thank you again,
    Abe

    • In case of a failure of the primary node, services are automatically failed-over to the other node, which then becomes the primary node for such services. This is basically the idea of having a failover cluster.

  2. Such an amazing tutorial! I now feel so comfortable and confident about the Clustering!
    Thanks a ton. Cheers.

  3. Hi Tomas,
    Could you please suggest how can i create a custom resource. For example instead of using ocf::heartbeat:apache i need to configured ocf::heartbeat:my_application_service.

    • Hi, cluster resource scripts are similar to init scripts where they need to support start, stop and status. Pacemaker supports LSB-compliant scripts natively, so in theory, as long as your script is LSB compatible, you can create one.

      To give you an idea of the process, you will need to:

      1) ensure that script is LSB-compliant
      2) copy the script under /etc/init.d
      3) check to make sure that your script can be seen by pacemaker
      4) add the script as cluster resource
      5) check the configuration of the resource

      I hope that helps!

Leave a Reply

Your email address will not be published. Required fields are marked *