Corosync and Pacemaker in Slackware
This will be multi part post about high availability solution for Slackware. My first post will be about Corosync and Pacemaker.
You need to combine Corosync and Pacemaker with other distributed storage system such as DRBD/OCFS2/GFS. I’ll talk about these stacks in another post.
GOAL:
- A MySQL server will always be available at the same IP even though it’s actually down (another server will take over automatically without the needs for manual intervention).
Environments:
Slackware v13.37
Two nodes will be used:
Node 1: 192.168.1.101 Node 2: 192.168.1.102 Cluster/Main/Failover IP: 192.168.1.100
The MySQL data is not syncronized, this post is just about Corosync and Pacemaker.
Guides:
Download and install these packages (by this order) in both nodes:
http://slackbuilds.org/repository/13.37/libraries/libnet/ http://slackbuilds.org/repository/13.37/libraries/libesmtp/ http://slackbuilds.org/repository/13.37/system/clusterglue/ http://slackbuilds.org/repository/13.37/system/clusterresourceagents/ http://slackbuilds.org/repository/13.37/system/corosync/ http://slackbuilds.org/repository/13.37/system/pacemaker/
I strongly suggest you build these packages one by one just to be sure there are no missing dependencies. BTW, some script adjustments are needed for Cluster Resource Agents but I’m sure you guys can handle it ;-)
It would be easier for the next steps if password-less login with OpenSSH is enabled. In your
Node 1
:ssh-keygen -t rsa ssh-copy-id -i ~/.ssh/id_rsa.pub root@192.168.1.102
Generate an authentication key for Corosync:
corosync-keygen
If you’re connecting remotely, pressing your keyboard won’t do any good. The fastest way would be typing directly into the server. The other way is running
find .
on your/
directory (pressControl + C
when the key has been generated).Copy the new generated authentication key to
Node 2
:scp /etc/corosync/authkey 192.168.1.102:/etc/corosync
Copy the default
corosync
configuration file:cp /etc/corosync/corosync.conf.example /etc/corosync/corosync.conf
Replace
bindnetaddr
andlogfile
(optional):bindnetaddr: 192.168.1.0 logfile: /var/log/corosync
You can checkout the reference about those values. From corosync:
If the local interface was 10.12.12.93 and the netmask was 255.0.0.0, Totem would execute the logical operation 10.12.12.93 & 255.0.0.0 and produce the value 10.0.0.0. This value would be compared against bindnetaddr and bind Totem to the NIC that matches. This can cause confusion if netmask or bindnetaddr are not set properly. In the example above, if bindnetaddr is 10.12.12.0, the network interface will never be matched. If bindnetaddr is 10.0.0.0 the interface will be matched.
Copy
corosync.conf
toNode 2
:scp /etc/corosync/corosync.conf 192.168.1.102:/etc/corosync
Create
pacemaker
file so thatCorosync
will automatically loadPacemaker
when it’s started:touch /etc/corosync/service.d/pacemaker
Put these configs in that file:
service { # Load the Pacemaker Cluster Resource Manager name: pacemaker ver: 0 }
Copy the
pacemaker
file to Node 2:scp /etc/corosync/service.d/pacemaker 192.168.1.102:/etc/corosync/service.d/
Start your Corosync and let the magic begins:
/etc/rc.d/rc.corosync start
Check your log for any error:
tail -f /var/log/corosync
Check your process list:
ps auxf
Corosync should also load other processes automatically:
root 2008 0.5 3.4 52668 3964 ? Ssl 13:55 0:00 corosync root 2015 0.0 1.9 12140 2248 ? S 13:55 0:00 \_ /usr/lib/heartbeat/stonithd 226 2016 0.3 3.3 13004 3796 ? S 13:55 0:00 \_ /usr/lib/heartbeat/cib root 2017 0.0 1.6 6812 1848 ? S 13:55 0:00 \_ /usr/lib/heartbeat/lrmd 226 2018 0.1 2.2 12404 2540 ? S 13:55 0:00 \_ /usr/lib/heartbeat/attrd 226 2019 0.0 1.7 8664 2032 ? S 13:55 0:00 \_ /usr/lib/heartbeat/pengine 226 2020 0.1 2.5 12528 2904 ? S 13:55 0:00 \_ /usr/lib/heartbeat/crmd
Monitor your cluster using Pacemaker tools:
crm status
It should be something like this:
============ Last updated: Sun May 13 13:57:43 2012 Stack: openais Current DC: node1 - partition with quorum Version: 1.1.1-b9b672590e79770afb63b9b455400d92fb6b5d9e 2 Nodes configured, 2 expected votes 0 Resources configured. ============ Online: [ node1 node2 ]
Give them some time to online if they’re offline.
Put some main configurations to your cluster:
crm configure property stonith-enabled=false property no-quorum-policy=ignore commit quit
If you’re getting some errors such as
ERROR: cib-bootstrap-options: attribute last-lrm-refresh does not exist
, just proceed. It maybe a bug.We had to disable
stonith
since we just want our Pacemaker to be running. However, in real production environment, you really need to configurestonith
, you can read more about it here.We also need to ignore quorum policy since we’re only using 2 nodes and you can read more about it here.
You can see your new configuration by running:
crm configure show
Which will output:
node node1 node node2 property $id="cib-bootstrap-options" \ dc-version="1.1.1-b9b672590e79770afb63b9b455400d92fb6b5d9e" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ stonith-enabled="false" \ last-lrm-refresh="1336919205" \ no-quorum-policy="ignore"
If you accidentally put some wrong configurations and don’t know how to edit it, you can use
crm configure edit
to change your configurations directly but this method is highly not recommended since it’s error-prone.It’s time to configure our main/failover/cluster IP (our client will use this IP, not the nodes IP):
crm configure primitive ip ocf:heartbeat:IPaddr params ip="192.168.1.100" op monitor interval=10s commit
If everyting goes well, you should be able to ping the cluster IP (
192.168.1.100
) andcrm status
should yield this result:============ Last updated: Sun May 13 14:28:19 2012 Stack: openais Current DC: node1 - partition with quorum Version: 1.1.1-b9b672590e79770afb63b9b455400d92fb6b5d9e 2 Nodes configured, 2 expected votes 1 Resources configured. ============ Online: [ node1 node2 ] ip (ocf::heartbeat:IPaddr): Started node1
We’ll now setup MySQL monitoring with
Pacemaker
. But before that, make sure you:Installed MySQL in both of the nodes.
Able to connect to your MySQL from other than
localhost
:mysql -u root -p -h 192.168.1.101 mysql -u root -p -h 192.168.1.102
You can use this command to allow any host to connect to your MySQL:
GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY 'password' WITH GRANT OPTION; FLUSH PRIVILEGES;
Created a database in
Node 1
andNode 2
. For an example, a database namednode1
inNode 1
andnode2
inNode 2
. This is just for verification.Add this resource:
crm configure primitive mysql ocf:heartbeat:mysql \ params binary="/usr/bin/mysqld_safe" config="/etc/my.cnf" user="mysql" pid="/var/run/mysql/mysql.pid" datadir="/var/lib/mysql" socket="/var/run/mysql/mysql.sock" \ op monitor interval="30s" timeout="30s" \ op start interval="0" timeout="120" \ op stop interval="0" timeout="120" commit quit
The parameter above is purely based on the standard Slackware’s MySQL package. So make sure you’ve created
/etc/my.cnf
which is not available by default. Just copy from the default file:cp /etc/my-small.cnf /etc/my.cnf
Your latest
crm status
would show something like this:============ Last updated: Mon May 14 01:13:23 2012 Stack: openais Current DC: node1 - partition with quorum Version: 1.1.1-b9b672590e79770afb63b9b455400d92fb6b5d9e 2 Nodes configured, 2 expected votes 2 Resources configured. ============ Online: [ node1 node2 ] ip (ocf::heartbeat:IPaddr): Started node1 mysql (ocf::heartbeat:mysql): Started node2
As you can see,
mysql
has been started onNode 2
. Actually it doesn’t matter in which node it will start first (for this tutorial, not for the production server), what important is that if one of the nodes is down, the other node should start its MySQL automatically. You can test this situation by running these commands in yourNode 2
to simulate a node failure:crm node standby quit
crm status
would show something like this (giveNode 1
some time before it starts its MySQL):============ Last updated: Mon May 14 01:21:12 2012 Stack: openais Current DC: node1 - partition with quorum Version: 1.1.1-b9b672590e79770afb63b9b455400d92fb6b5d9e 2 Nodes configured, 2 expected votes 2 Resources configured. ============ Node node2: standby Online: [ node1 ] ip (ocf::heartbeat:IPaddr): Started node1 mysql (ocf::heartbeat:mysql): Started node1
Right now, your client can use the cluster IP (
192.168.1.100
) to connect to your MySQL. The client won’t realize which node it connected to. In this case, he/she will connect toNode 2
if both of them (the nodes) are online. IfNode 2
is offline,192.168.1.100
will automatically connect the client to MySQL in192.168.1.101
. IfNode 1
is offline,192.168.1.100
will automatically uses MySQL inNode 2
which is in192.168.1.102
.To reonline
Node 2
, just use these commands in yourNode 2
:crm node online quit
However, usually you want to control which MySQL will be up first, either in
Node 1
or inNode 2
. To make this happen, you need to usecolocation
:crm configure colocation ip-mysql inf: ip mysql commit quit
crm status
would show something like this:============ Last updated: Mon May 14 01:26:41 2012 Stack: openais Current DC: node1 - partition with quorum Version: 1.1.1-b9b672590e79770afb63b9b455400d92fb6b5d9e 2 Nodes configured, 2 expected votes 2 Resources configured. ============ Online: [ node1 node2 ] ip (ocf::heartbeat:IPaddr): Started node1 mysql (ocf::heartbeat:mysql): Started node1
That means, your
mysql
has been started onNode 1
. So, everytimecorosync
is started on both of the nodes,mysql
will be started onNode 1
due to thecolocation
configuration.Try turning off
Node 1
orNode 2
and see how MySQL switches side from both of the nodes.
I think that’s it, next tutorial should be mainly about DRBD. Good luck!