Install and configure High Availability Linux Cluster with Pacemaker on CentOS 7.6
Table of Contents
Introduction to Linux Cluster with Pacemaker
Linux Cluster with Pacemaker is one of the common clusters we can set up on Linux servers. The pacemaker was available for both RPM-based and Debian based operating system.
The Pacemaker is a high-availability cluster resource manager it will run on all the hosts which we suppose to use in the cluster to make sure our services up and running to reduce the downtime. Pacemaker supports following node redundancy configuration Active/Active, Active/Passive, N+1, N+M, N-to-1 and N-to-N. The maximum numbers of nodes accepted in a cluster are 16.
As common in every cluster, the underlying operating system distribution and version should be the same for all the nodes. We are going to use CentOS 7.6 in our setup. Moreover, the hardware specification should match as well.
Perform a Minimal OS installation to start with setting up the Cluster. Follow below guide to setup your minimal installation on all the nodes planned to set up as a cluster.
Cluster setup is very sensible and needs proper time sync. While following above guide for minimal OS installation make sure to use a proper time/date and a static IP with network configuration and disk setup.
In our setup, we are about to use with below hostnames and IP information for all the node’s in our cluster.
S:NO:
HOSTNAME
SHORT NAME
IPADDRESS
1.
corcls1
corcls1.linuxsysadmins.local
192.168.107.200
2.
corcls2
corcls2.linuxsysadmins.local
192.168.107.201
3.
corcls3
corcls3.linuxsysadmins.local
192.168.107.202
4.
corcls4
corcls4.linuxsysadmins.local
192.168.107.203
5.
VirtualIP
web-vip
192.168.107.225
In our first guide, we will use only two nodes, later we will use the remaining two nodes to add as an additional node to the cluster.
All the below steps need to be carried on each node except “Configure CoroSync” which need to be carried out only on node1.
Network Setup
If you need to skip network, NTP and do as part of post-installation skip the graphical demonstration. Below you will find the commands to configure Host-name, Interface and NTP. But, make sure to never skip the disk partitioning.
We need to configure a static IP to make the cluster stable by eliminating IP assignment from DHCP servers. Because DHCP’s periodic address renewal will interfere with corosync. To sync the date/time first we need to complete the network configuration.
To download the packages from the Internet make sure to reach the gateway.
Type the hostname in the designated area and click Apply to make the changes.
Configure Timezone and NTP server
Choose your timezone where your server resides.
To configure the NTP server, click on the Gear icon and add the timeserver or use the default existing ones.
The working servers can be identified by there status in green.
Partitioning the Disk
The partitioning should be defined as small in size for /, /home and swap. Remaining size can be left for future use under the volume group.
Select the filesystem type as XFS and device type as LVM.
Click on Modify under Volume Group. You will get the above window, choose the Size policy “As large as possible” to leave the remaining space under Volume Group.
Remaining steps are the same as installing a minimal Operating system.
Set the System Locale
If your setup with a minimal installation is required to set the C Type locale language to en_US.utf8.
# localectl set-locale LC_CTYPE=en_US.utf8
Print the status to verify the same.
[root@corcls1 ~]# localectl status
System Locale: LC_CTYPE=en_US.utf8
VC Keymap: us
X11 Layout: us
[root@corcls1 ~]#
Assigning Static IP Address from CLI
If you have skipped the network settings during OS installation later we can configure the same by running nmcli command to configure the static IP.
If you have missed assigning the timezone during the graphical installation, we can configure the same after installing the operating system by performing post configuration.
[root@corcls1 ~]# timedatectl status
Local time: Sat 2019-08-03 17:02:49 +04
Universal time: Sat 2019-08-03 13:02:49 UTC
RTC time: Sat 2019-08-03 13:02:49
Time zone: Asia/Dubai (+04, +0400)
NTP enabled: yes
NTP synchronized: yes
RTC in local TZ: no
DST active: n/a
[root@corcls1 ~]#
Verify the sync status, Use -v to get more informative output.
To perform admin tasks, running privileged commands or to copy any files set-up an SSH passwordless authentication by generation an SSH key.
# ssh-keygen
[root@corcls1 ~]# ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:ATVwcKf58ApvYaWsURZVOabinsQBGaSwiuwAGCW0BzY root@corcls1.linuxsysadmins.local
The key's randomart image is:
+---[RSA 2048]----+
|+Eo .***.o… |
|oo+o .o+ * + |
|o…. .B .o . |
|+.. +oB. |
|+. ooSoo |
|o *+o |
| . .o+. |
| .o |
| |
+----[SHA256]-----+
[root@corcls1 ~]#
Copy the generated SSH key to all the nodes.
# ssh-copy-id root@corcls2
[root@corcls1 ~]# ssh-copy-id root@corcls2
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
The authenticity of host 'corcls2 (192.168.107.201)' can't be established.
ECDSA key fingerprint is SHA256:Q6D+CZ+PH9PEmUIJwOkJeWBz91z273zwXEBPjk81mX0.
ECDSA key fingerprint is MD5:a3:35:63:21:01:ae:df:3e:6d:b3:6b:79:d9:0d:ff:a8.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@corcls2's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'root@corcls2'"
and check to make sure that only the key(s) you wanted were added.
[root@corcls1 ~]#
Verify the passwordless authentication by login into all the nodes.
[root@corcls1 ~]#
[root@corcls1 ~]# ssh root@corcls2
Last login: Fri Aug 2 12:14:21 2019 from 192.168.107.1
[root@corcls2 ~]# exit
logout
Connection to corcls2 closed.
[root@corcls1 ~]#
Allow Cluster services through Firewall
Enable the required ports by enabling High-Availability firewall service.
Below is the firewalld service available to enable the service.
[root@corcls1 ~]# cat /usr/lib/firewalld/services/high-availability.xml
<?xml version="1.0" encoding="utf-8"?>
<service>
<short>Red Hat High Availability</short>
<description>This allows you to use the Red Hat High Availability (previously named Red Hat Cluster Suite). Ports are opened for corosync, pcsd, pacemaker_remote, dlm and corosync-qnetd.>/description>
<port protocol="tcp" port="2224"/>
<port protocol="tcp" port="3121"/>
<port protocol="tcp" port="5403"/>
<port protocol="udp" port="5404"/>
<port protocol="udp" port="5405"/>
<port protocol="tcp" port="9929"/>
<port protocol="udp" port="9929"/>
<port protocol="tcp" port="21064"/>
</service>
[root@corcls1 ~]#
By running below command it will enable all ports. Reload the firewalld service to make the changes.
[root@corcls1 ~]# corosync -v
Corosync Cluster Engine, version '2.4.3'
Copyright (c) 2006-2009 Red Hat, Inc.
[root@corcls1 ~]#
Start the Cluster Service
Once the cluster packages installed. Start the cluster service pcsd and enable the cluster service persistently to start the service during the system reboot.
We need to create a password for this user across all the nodes. It can be done by running on individual nodes or by running from any one of the nodes using SSH.
Verify the cluster status, we should now get the status for both serves as online.
# pcs cluster status
We can verify below command as well to check the cluster status.
# pcs status
Enable the Cluster Service Persistently
To bring up the cluster service and join the node automatically to the cluster. To start the cluster service persistently use “enable” option with the pcs command.
# pcs cluster enable --all
Below output shows cluster services are persistently enabled.
Verify the Quorum and voting status with anyone of below command.
# pcs status quorum
# corosync-quorumtool
[root@corcls1 ~]# pcs status quorum
Quorum information
------------------
Date: Fri Aug 2 13:53:31 2019
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 1
Ring ID: 1/8
Quorate: Yes
Votequorum information
----------------------
Expected votes: 2
Highest expected: 2
Total votes: 2
Quorum: 1
Flags: 2Node Quorate WaitForAll
Membership information
----------------------
Nodeid Votes Qdevice Name
1 1 NR corcls1 (local)
2 1 NR corcls2
[root@corcls1 ~]#
Check the status of CoroSync
CoroSync is the cluster engine which provides services like membership, messaging and quorum.
# pcs status corosync
[root@corcls1 ~]# pcs status corosync
Membership information
----------------------
Nodeid Votes Name
1 1 corcls1 (local)
2 1 corcls2
[root@corcls1 ~]#
Verify the CoroSync & CIB Configuration
It’s better to know the corosync and CIB configuration files.
The CIB file or Cluster information base will be saved in an XML format which will take care of all nodes and resources state. The CIB will be synchronized across the cluster and handles requests to modify it.
To view the cluster information base use option cib with pcs command.
You can notice the configuration with node information, cluster name and much more.
Logs to looks for
The log file we need to look for anything related to cluster service.
# tail -f /var/log/cluster/corosync.log
[root@corcls1 ~]# tail -n 10 /var/log/cluster/corosync.log
Aug 03 14:01:49 [1406] corcls1.linuxsysadmins.local cib: info: cib_file_backup: Archived previous version as /var/lib/pacemaker/cib/cib-10.raw
Aug 03 14:01:49 [1406] corcls1.linuxsysadmins.local cib: info: cib_file_write_with_digest: Wrote version 0.7.0 of the CIB to disk (digest: b1e78c0e1364bb94dec0fefdd2ff1bd1)
Aug 03 14:01:49 [1406] corcls1.linuxsysadmins.local cib: info: cib_file_write_with_digest: Reading cluster configuration file /var/lib/pacemaker/cib/cib.PeKvG9 (digest: /var/lib/pacemaker/cib/cib.GkgYGD)
Aug 03 14:01:54 [1406] corcls1.linuxsysadmins.local cib: info: cib_process_ping: Reporting our current digest to corcls2: 2e36d8d0181912ebe6a1f058cb613057 for 0.7.4 (0x55c951db95f0 0)
Aug 03 14:01:58 [1414] corcls1.linuxsysadmins.local crmd: info: crm_procfs_pid_of: Found cib active as process 1406
Aug 03 14:01:58 [1414] corcls1.linuxsysadmins.local crmd: notice: throttle_check_thresholds: High CPU load detected: 1.390000
Aug 03 14:01:58 [1414] corcls1.linuxsysadmins.local crmd: info: throttle_send_command: New throttle mode: 0100 (was ffffffff)
Aug 03 14:02:28 [1414] corcls1.linuxsysadmins.local crmd: info: throttle_check_thresholds: Moderate CPU load detected: 0.920000
Aug 03 14:02:28 [1414] corcls1.linuxsysadmins.local crmd: info: throttle_send_command: New throttle mode: 0010 (was 0100)
Aug 03 14:02:58 [1414] corcls1.linuxsysadmins.local crmd: info: throttle_send_command: New throttle mode: 0000 (was 0010)
[root@corcls1 ~]#
That’s it we have completed with the basic pacemaker cluster setup.
In our next guide let see how to manage the cluster from the GUI.
Conclusion
The basic pacemaker Linux cluster setup will provide high availability for any services configured to use with it. Let see how to create a resource, about fencing and much more on upcoming articles. Subscribe to our newsletter and stay with us to receive the updates. Your feedbacks are most welcome in below comment section.
3 thoughts on “Install and configure High Availability Linux Cluster with Pacemaker on CentOS 7.6”
Hello. I have a question, so I wrote it down. I tried to check the source part of the pacemaker by inserting a drbd, and I succeeded But I wonder if there is a way to put monitoring tools such as nagios in the resource section.
Hi,
We are new to Apache CloudStack. We are looking for a Primary Storage (NFS share) solution, where it does not fail because of single node failure. Is there a way where I can use the NFS via any kind of clustering, so that when one node fails i will still have the VM’s working from another node which is in cluster.
Basically we need a system that has as follows:-
1. One single point IP address with the shared mount point being same
2. NFS storage, as Apache CloudStack supports HA only with NFS.
3. I need to deploy around around 60 VM’s for our application.
Hello. I have a question, so I wrote it down. I tried to check the source part of the pacemaker by inserting a drbd, and I succeeded But I wonder if there is a way to put monitoring tools such as nagios in the resource section.
Hi,
We are new to Apache CloudStack. We are looking for a Primary Storage (NFS share) solution, where it does not fail because of single node failure. Is there a way where I can use the NFS via any kind of clustering, so that when one node fails i will still have the VM’s working from another node which is in cluster.
Basically we need a system that has as follows:-
1. One single point IP address with the shared mount point being same
2. NFS storage, as Apache CloudStack supports HA only with NFS.
3. I need to deploy around around 60 VM’s for our application.
Regards,
Mark
Excellent guide