Corosync and Pacemaker in Slackware
This will be multi part post about high availability solution for Slackware. My first post will be about Corosync and Pacemaker.
You need to combine Corosync and Pacemaker with other distributed storage system such as DRBD/OCFS2/GFS. I’ll talk about these stacks in another post.
GOAL:
- A MySQL server will always be available at the same IP even though it’s actually down (another server will take over automatically without the needs for manual intervention).
Environments:
Slackware v13.37
Two nodes will be used:
Node 1: 192.168.1.101 Node 2: 192.168.1.102 Cluster/Main/Failover IP: 192.168.1.100
The MySQL data is not syncronized, this post is just about Corosync and Pacemaker.
Guides:
Download and install these packages (by this order) in both nodes:
http://slackbuilds.org/repository/13.37/libraries/libnet/ http://slackbuilds.org/repository/13.37/libraries/libesmtp/ http://slackbuilds.org/repository/13.37/system/clusterglue/ http://slackbuilds.org/repository/13.37/system/clusterresourceagents/ http://slackbuilds.org/repository/13.37/system/corosync/ http://slackbuilds.org/repository/13.37/system/pacemaker/I strongly suggest you build these packages one by one just to be sure there are no missing dependencies. BTW, some script adjustments are needed for Cluster Resource Agents but I’m sure you guys can handle it ;-)
It would be easier for the next steps if password-less login with OpenSSH is enabled. In your
Node 1:ssh-keygen -t rsa ssh-copy-id -i ~/.ssh/id_rsa.pub root@192.168.1.102Generate an authentication key for Corosync:
corosync-keygenIf you’re connecting remotely, pressing your keyboard won’t do any good. The fastest way would be typing directly into the server. The other way is running
find .on your/directory (pressControl + Cwhen the key has been generated).Copy the new generated authentication key to
Node 2:scp /etc/corosync/authkey 192.168.1.102:/etc/corosyncCopy the default
corosyncconfiguration file:cp /etc/corosync/corosync.conf.example /etc/corosync/corosync.confReplace
bindnetaddrandlogfile(optional):bindnetaddr: 192.168.1.0 logfile: /var/log/corosyncYou can checkout the reference about those values. From corosync:
If the local interface was 10.12.12.93 and the netmask was 255.0.0.0, Totem would execute the logical operation 10.12.12.93 & 255.0.0.0 and produce the value 10.0.0.0. This value would be compared against bindnetaddr and bind Totem to the NIC that matches. This can cause confusion if netmask or bindnetaddr are not set properly. In the example above, if bindnetaddr is 10.12.12.0, the network interface will never be matched. If bindnetaddr is 10.0.0.0 the interface will be matched.
Copy
corosync.conftoNode 2:scp /etc/corosync/corosync.conf 192.168.1.102:/etc/corosyncCreate
pacemakerfile so thatCorosyncwill automatically loadPacemakerwhen it’s started:touch /etc/corosync/service.d/pacemakerPut these configs in that file:
service { # Load the Pacemaker Cluster Resource Manager name: pacemaker ver: 0 }Copy the
pacemakerfile to Node 2:scp /etc/corosync/service.d/pacemaker 192.168.1.102:/etc/corosync/service.d/Start your Corosync and let the magic begins:
/etc/rc.d/rc.corosync startCheck your log for any error:
tail -f /var/log/corosyncCheck your process list:
ps auxfCorosync should also load other processes automatically:
root 2008 0.5 3.4 52668 3964 ? Ssl 13:55 0:00 corosync root 2015 0.0 1.9 12140 2248 ? S 13:55 0:00 \_ /usr/lib/heartbeat/stonithd 226 2016 0.3 3.3 13004 3796 ? S 13:55 0:00 \_ /usr/lib/heartbeat/cib root 2017 0.0 1.6 6812 1848 ? S 13:55 0:00 \_ /usr/lib/heartbeat/lrmd 226 2018 0.1 2.2 12404 2540 ? S 13:55 0:00 \_ /usr/lib/heartbeat/attrd 226 2019 0.0 1.7 8664 2032 ? S 13:55 0:00 \_ /usr/lib/heartbeat/pengine 226 2020 0.1 2.5 12528 2904 ? S 13:55 0:00 \_ /usr/lib/heartbeat/crmdMonitor your cluster using Pacemaker tools:
crm statusIt should be something like this:
============ Last updated: Sun May 13 13:57:43 2012 Stack: openais Current DC: node1 - partition with quorum Version: 1.1.1-b9b672590e79770afb63b9b455400d92fb6b5d9e 2 Nodes configured, 2 expected votes 0 Resources configured. ============ Online: [ node1 node2 ]Give them some time to online if they’re offline.
Put some main configurations to your cluster:
crm configure property stonith-enabled=false property no-quorum-policy=ignore commit quitIf you’re getting some errors such as
ERROR: cib-bootstrap-options: attribute last-lrm-refresh does not exist, just proceed. It maybe a bug.We had to disable
stonithsince we just want our Pacemaker to be running. However, in real production environment, you really need to configurestonith, you can read more about it here.We also need to ignore quorum policy since we’re only using 2 nodes and you can read more about it here.
You can see your new configuration by running:
crm configure showWhich will output:
node node1 node node2 property $id="cib-bootstrap-options" \ dc-version="1.1.1-b9b672590e79770afb63b9b455400d92fb6b5d9e" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ stonith-enabled="false" \ last-lrm-refresh="1336919205" \ no-quorum-policy="ignore"If you accidentally put some wrong configurations and don’t know how to edit it, you can use
crm configure editto change your configurations directly but this method is highly not recommended since it’s error-prone.It’s time to configure our main/failover/cluster IP (our client will use this IP, not the nodes IP):
crm configure primitive ip ocf:heartbeat:IPaddr params ip="192.168.1.100" op monitor interval=10s commitIf everyting goes well, you should be able to ping the cluster IP (
192.168.1.100) andcrm statusshould yield this result:============ Last updated: Sun May 13 14:28:19 2012 Stack: openais Current DC: node1 - partition with quorum Version: 1.1.1-b9b672590e79770afb63b9b455400d92fb6b5d9e 2 Nodes configured, 2 expected votes 1 Resources configured. ============ Online: [ node1 node2 ] ip (ocf::heartbeat:IPaddr): Started node1We’ll now setup MySQL monitoring with
Pacemaker. But before that, make sure you:Installed MySQL in both of the nodes.
Able to connect to your MySQL from other than
localhost:mysql -u root -p -h 192.168.1.101 mysql -u root -p -h 192.168.1.102You can use this command to allow any host to connect to your MySQL:
GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY 'password' WITH GRANT OPTION; FLUSH PRIVILEGES;Created a database in
Node 1andNode 2. For an example, a database namednode1inNode 1andnode2inNode 2. This is just for verification.Add this resource:
crm configure primitive mysql ocf:heartbeat:mysql \ params binary="/usr/bin/mysqld_safe" config="/etc/my.cnf" user="mysql" pid="/var/run/mysql/mysql.pid" datadir="/var/lib/mysql" socket="/var/run/mysql/mysql.sock" \ op monitor interval="30s" timeout="30s" \ op start interval="0" timeout="120" \ op stop interval="0" timeout="120" commit quitThe parameter above is purely based on the standard Slackware’s MySQL package. So make sure you’ve created
/etc/my.cnfwhich is not available by default. Just copy from the default file:cp /etc/my-small.cnf /etc/my.cnfYour latest
crm statuswould show something like this:============ Last updated: Mon May 14 01:13:23 2012 Stack: openais Current DC: node1 - partition with quorum Version: 1.1.1-b9b672590e79770afb63b9b455400d92fb6b5d9e 2 Nodes configured, 2 expected votes 2 Resources configured. ============ Online: [ node1 node2 ] ip (ocf::heartbeat:IPaddr): Started node1 mysql (ocf::heartbeat:mysql): Started node2As you can see,
mysqlhas been started onNode 2. Actually it doesn’t matter in which node it will start first (for this tutorial, not for the production server), what important is that if one of the nodes is down, the other node should start its MySQL automatically. You can test this situation by running these commands in yourNode 2to simulate a node failure:crm node standby quitcrm statuswould show something like this (giveNode 1some time before it starts its MySQL):============ Last updated: Mon May 14 01:21:12 2012 Stack: openais Current DC: node1 - partition with quorum Version: 1.1.1-b9b672590e79770afb63b9b455400d92fb6b5d9e 2 Nodes configured, 2 expected votes 2 Resources configured. ============ Node node2: standby Online: [ node1 ] ip (ocf::heartbeat:IPaddr): Started node1 mysql (ocf::heartbeat:mysql): Started node1Right now, your client can use the cluster IP (
192.168.1.100) to connect to your MySQL. The client won’t realize which node it connected to. In this case, he/she will connect toNode 2if both of them (the nodes) are online. IfNode 2is offline,192.168.1.100will automatically connect the client to MySQL in192.168.1.101. IfNode 1is offline,192.168.1.100will automatically uses MySQL inNode 2which is in192.168.1.102.To reonline
Node 2, just use these commands in yourNode 2:crm node online quitHowever, usually you want to control which MySQL will be up first, either in
Node 1or inNode 2. To make this happen, you need to usecolocation:crm configure colocation ip-mysql inf: ip mysql commit quitcrm statuswould show something like this:============ Last updated: Mon May 14 01:26:41 2012 Stack: openais Current DC: node1 - partition with quorum Version: 1.1.1-b9b672590e79770afb63b9b455400d92fb6b5d9e 2 Nodes configured, 2 expected votes 2 Resources configured. ============ Online: [ node1 node2 ] ip (ocf::heartbeat:IPaddr): Started node1 mysql (ocf::heartbeat:mysql): Started node1That means, your
mysqlhas been started onNode 1. So, everytimecorosyncis started on both of the nodes,mysqlwill be started onNode 1due to thecolocationconfiguration.Try turning off
Node 1orNode 2and see how MySQL switches side from both of the nodes.
I think that’s it, next tutorial should be mainly about DRBD. Good luck!