Updated: September 21, 2021
In this short guide, we’ll show you how to build an Axigen active-passive cluster based on the Pacemaker and Corosync cluster stack documented on the Cluster Labs website.
The steps below must be performed on both nodes unless specified otherwise.
How to Setup an Active-Passive Cluster Based on Pacemaker and Corosync
Installation
Following the instructions from Cluster Labs: RHEL7 Quickstart and Cluster Labs: Clusters from Scratch, we start by installing all needed packages:
If you are using default SELinux configuration you must turn on the daemons_enable_cluster_mode
boolean, which is disabled by default:
The pcs daemon is used to work with the pcs command-line interface to manage synchronizing the Corosync configuration across all nodes in the cluster. Before the cluster can be configured, the pcs daemon must be started and enabled to start at boot time on each node, using the following commands:
We have chosen to not enable the Corosync and Pacemaker services to start at boot. If a cluster node fails or is rebooted, you will need to run the following command to start the cluster on it:
If you would like to have the cluster services up when the node is started, you should set the Pacemaker and Corosync services to start at boot:
Please note that requiring a manual start of cluster services gives you the opportunity to do a post-mortem investigation of a node failure before returning it to the cluster.
Configuration
Before starting the cluster configuration we have to be sure that both nodes are also reachable by their short name and if the DNS does not resolve them accordingly (based on default search domains) you should add, on both nodes, the following lines in /etc/hosts
:
Configure the Cluster
First, we configure the password for the user running the cluster processes on both nodes:
Next, we have to authenticate pcs to pcsd on both nodes:
Then, on only one node, we are creating the cluster and populate it with nodes:
Start the Cluster
We start the cluster on only one node:
From now on, if not specifically mentioned, all commands should be run only on one node.
Prepare the Cluster
For data safety, the cluster default configuration has STONITH
enabled. We will disable it and configure it at a later point, by setting the stonith-enabled
cluster option to false
:
In order to reduce the possibility of data corruption, Pacemaker's default behavior is to stop all resources if the cluster does not have quorum. Because a cluster is said to have quorum when more than half of the known or expected nodes are online, a two-node cluster only has quorum when both nodes are running, which is no longer the case for our cluster. It is possible to control how Pacemaker behaves when quorum is lost. In particular, we can tell the cluster to simply ignore quorum altogether:
Then, on both nodes, verify the cluster status report with the pcs status
command:
If on a particular node pcs status
is reporting:
then you should repeat the authentication command on that particular node:
Because the resources are being started by the cluster immediately after their creation, we will put both nodes in stand-by and bring them online after finishing resources configuration. To put both nodes in stand-by, just issue the following two commands:
The pcs cluster status
command will list nodes confirming their standby status.
DRBD Resource
Create a configuration file to be used to commit all configuration changes atomically:
The first resource you need to add to your cluster is the DRBD filesystem you have previously created. This functionality is provided by the ocf:linbit:drbdresource
agent, as follows:
The above resource, named drbd_axigen_ha
, specifies only the DRBD resource as parameter and a monitoring interval of 60 seconds.
Next, we need to create a master / slave resource, which will tell the cluster manager to only run the drbd_axigen_ha
resource on the node that has DRBD configured as primary.
The third resource related to DRBD is the file system mount itself, provided by the ocf:heartbeat:Filesystem
resource agent, configured with parameters specifying the device to mount, the mount point and the file system type.
Finally, we have to specify that the file system resource must run on the Master node and that the mount action must take place on the same machine as the one that has been promoted the master / slave resource:
Review the configured resources:
After you are satisfied with all the changes, you can commit them all at once by pushing the drbd_cfg
file into the live CIB.
IP Resource
A floating IP address must be assigned to the active node in the cluster, to ensure transparency for the Axigen services. This can be achieved by defining a resource based on the ocf:heartbeat:IPaddr2
agent, as follows:
Axigen Service Resource
The last resource is the Axigen init script configured above, which should be added like in the example below:
Resource Ordering
Because the successful startup of the defined resources depends on their order, you have to add some ordering constraints, which will ensure the following order: File system → IP Address → Axigen init script.
Location Preference
Like in the case of the resource order, besides being started in a preferred order, they also need to run on the same machine. To achieve this, the IP address and file system resources are constrained to run on the same node as the Axigen init script resource:
You can also set up a preferred node for running the cluster resources, by specifying a location constraint. For example, using our example, you can set the node1 as preferred for running the Axigen init script resource (and its dependencies):
Sometimes, after a node fails, it comes back alive eventually. To avoid the resources being transferred back to it (generating an additional downtime), you can setup general cluster resource stickiness with a higher score than the node preference defined above, as follows:
Fencing
STONITH
is an acronym for “Shoot The Other Node In The Head” and it protects your data from being corrupted by rogue nodes or concurrent access. With Pacemaker, STONITH
is a node fencing daemon which also must be configured to achieve full data safety.
Using the example configuration from this document (APC), we have defined a STONITH
fencing device as follows:
Set back the stonith-enabled
cluster property you have switched off at the beginning of the cluster setup:
Notes for other fencing devices
-
Using ESXi / VMware (VMware over SOAP Fencing)
Bring Both Nodes Online
After we have checked that the cluster is configured and all components did not return any errors we could bring both nodes online: