Corosync and Pacemaker in Slackware

This will be multi part post about high availability solution for Slackware. My first post will be about Corosync and Pacemaker.

You need to combine Corosync and Pacemaker with other distributed storage system such as DRBD/OCFS2/GFS. I’ll talk about these stacks in another post.

GOAL:

A MySQL server will always be available at the same IP even though it’s actually down (another server will take over automatically without the needs for manual intervention).

Environments:

Slackware v13.37

Two nodes will be used:

Node 1:
192.168.1.101

Node 2:
192.168.1.102

Cluster/Main/Failover IP:
192.168.1.100

The MySQL data is not syncronized, this post is just about Corosync and Pacemaker.

Guides:

Download and install these packages (by this order) in both nodes:

http://slackbuilds.org/repository/13.37/libraries/libnet/
http://slackbuilds.org/repository/13.37/libraries/libesmtp/
http://slackbuilds.org/repository/13.37/system/clusterglue/
http://slackbuilds.org/repository/13.37/system/clusterresourceagents/
http://slackbuilds.org/repository/13.37/system/corosync/
http://slackbuilds.org/repository/13.37/system/pacemaker/

I strongly suggest you build these packages one by one just to be sure there are no missing dependencies. BTW, some script adjustments are needed for Cluster Resource Agents but I’m sure you guys can handle it ;-)

It would be easier for the next steps if password-less login with OpenSSH is enabled. In your Node 1:
```
ssh-keygen -t rsa
ssh-copy-id -i ~/.ssh/id_rsa.pub root@192.168.1.102
```
Generate an authentication key for Corosync:
```
corosync-keygen
```
If you’re connecting remotely, pressing your keyboard won’t do any good. The fastest way would be typing directly into the server. The other way is running find . on your / directory (press Control + C when the key has been generated).

Copy the new generated authentication key to Node 2:

scp /etc/corosync/authkey 192.168.1.102:/etc/corosync

Copy the default corosync configuration file:

cp /etc/corosync/corosync.conf.example /etc/corosync/corosync.conf

Replace bindnetaddr and logfile (optional):
```
bindnetaddr: 192.168.1.0
logfile: /var/log/corosync
```
You can checkout the reference about those values. From corosync:
If the local interface was 10.12.12.93 and the netmask was 255.0.0.0, Totem would execute the logical operation 10.12.12.93 & 255.0.0.0 and produce the value 10.0.0.0. This value would be compared against bindnetaddr and bind Totem to the NIC that matches. This can cause confusion if netmask or bindnetaddr are not set properly. In the example above, if bindnetaddr is 10.12.12.0, the network interface will never be matched. If bindnetaddr is 10.0.0.0 the interface will be matched.

Copy corosync.conf to Node 2:

scp /etc/corosync/corosync.conf 192.168.1.102:/etc/corosync

Create pacemaker file so that Corosync will automatically load Pacemaker when it’s started:

touch /etc/corosync/service.d/pacemaker

Put these configs in that file:

service {
  # Load the Pacemaker Cluster Resource Manager
  name: pacemaker
  ver:  0
}

Copy the pacemaker file to Node 2:

scp /etc/corosync/service.d/pacemaker 192.168.1.102:/etc/corosync/service.d/

Start your Corosync and let the magic begins:
```
/etc/rc.d/rc.corosync start
```

Check your log for any error:

tail -f /var/log/corosync

Check your process list:

ps auxf

Corosync should also load other processes automatically:

root      2008  0.5  3.4  52668  3964 ?        Ssl  13:55   0:00 corosync
root      2015  0.0  1.9  12140  2248 ?        S    13:55   0:00  \_ /usr/lib/heartbeat/stonithd
226       2016  0.3  3.3  13004  3796 ?        S    13:55   0:00  \_ /usr/lib/heartbeat/cib
root      2017  0.0  1.6   6812  1848 ?        S    13:55   0:00  \_ /usr/lib/heartbeat/lrmd
226       2018  0.1  2.2  12404  2540 ?        S    13:55   0:00  \_ /usr/lib/heartbeat/attrd
226       2019  0.0  1.7   8664  2032 ?        S    13:55   0:00  \_ /usr/lib/heartbeat/pengine
226       2020  0.1  2.5  12528  2904 ?        S    13:55   0:00  \_ /usr/lib/heartbeat/crmd

Monitor your cluster using Pacemaker tools:

crm status

It should be something like this:

============
Last updated: Sun May 13 13:57:43 2012
Stack: openais
Current DC: node1 - partition with quorum
Version: 1.1.1-b9b672590e79770afb63b9b455400d92fb6b5d9e
2 Nodes configured, 2 expected votes
0 Resources configured.
============

Online: [ node1 node2 ]

Give them some time to online if they’re offline.

Put some main configurations to your cluster:
```
crm
configure
property stonith-enabled=false
property no-quorum-policy=ignore
commit
quit
```
If you’re getting some errors such as ERROR: cib-bootstrap-options: attribute last-lrm-refresh does not exist, just proceed. It maybe a bug.
We had to disable stonith since we just want our Pacemaker to be running. However, in real production environment, you really need to configure stonith, you can read more about it here.
We also need to ignore quorum policy since we’re only using 2 nodes and you can read more about it here.
You can see your new configuration by running:
```
crm configure show
```
Which will output:
```
node node1
node node2
property $id="cib-bootstrap-options" \
    dc-version="1.1.1-b9b672590e79770afb63b9b455400d92fb6b5d9e" \
    cluster-infrastructure="openais" \
    expected-quorum-votes="2" \
    stonith-enabled="false" \
    last-lrm-refresh="1336919205" \
    no-quorum-policy="ignore"
```
If you accidentally put some wrong configurations and don’t know how to edit it, you can use crm configure edit to change your configurations directly but this method is highly not recommended since it’s error-prone.

It’s time to configure our main/failover/cluster IP (our client will use this IP, not the nodes IP):

crm
configure
primitive ip ocf:heartbeat:IPaddr params ip="192.168.1.100" op monitor interval=10s
commit

If everyting goes well, you should be able to ping the cluster IP (192.168.1.100) and crm status should yield this result:

============
Last updated: Sun May 13 14:28:19 2012
Stack: openais
Current DC: node1 - partition with quorum
Version: 1.1.1-b9b672590e79770afb63b9b455400d92fb6b5d9e
2 Nodes configured, 2 expected votes
1 Resources configured.
============

Online: [ node1 node2 ]

ip     (ocf::heartbeat:IPaddr):        Started node1

We’ll now setup MySQL monitoring with Pacemaker. But before that, make sure you:
Installed MySQL in both of the nodes.
Able to connect to your MySQL from other than localhost:
```
mysql -u root -p -h 192.168.1.101
mysql -u root -p -h 192.168.1.102
```
You can use this command to allow any host to connect to your MySQL:
```
GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY 'password' WITH GRANT OPTION;
FLUSH PRIVILEGES;
```
Created a database in Node 1 and Node 2. For an example, a database named node1 in Node 1 and node2 in Node 2. This is just for verification.

Add this resource:

crm
configure
primitive mysql ocf:heartbeat:mysql \
params binary="/usr/bin/mysqld_safe" config="/etc/my.cnf" user="mysql" pid="/var/run/mysql/mysql.pid" datadir="/var/lib/mysql" socket="/var/run/mysql/mysql.sock" \
op monitor interval="30s" timeout="30s" \
op start interval="0" timeout="120" \
op stop interval="0" timeout="120"
commit
quit

The parameter above is purely based on the standard Slackware’s MySQL package. So make sure you’ve created /etc/my.cnf which is not available by default. Just copy from the default file:

cp /etc/my-small.cnf /etc/my.cnf

Your latest crm status would show something like this:
```
============
Last updated: Mon May 14 01:13:23 2012
Stack: openais
Current DC: node1 - partition with quorum
Version: 1.1.1-b9b672590e79770afb63b9b455400d92fb6b5d9e
2 Nodes configured, 2 expected votes
2 Resources configured.
============

Online: [ node1 node2 ]

 ip (ocf::heartbeat:IPaddr):    Started node1
 mysql  (ocf::heartbeat:mysql): Started node2
```
As you can see, mysql has been started on Node 2. Actually it doesn’t matter in which node it will start first (for this tutorial, not for the production server), what important is that if one of the nodes is down, the other node should start its MySQL automatically. You can test this situation by running these commands in your Node 2 to simulate a node failure:
```
crm
node
standby
quit
```
crm status would show something like this (give Node 1 some time before it starts its MySQL):
```
============
Last updated: Mon May 14 01:21:12 2012
Stack: openais
Current DC: node1 - partition with quorum
Version: 1.1.1-b9b672590e79770afb63b9b455400d92fb6b5d9e
2 Nodes configured, 2 expected votes
2 Resources configured.
============

Node node2: standby
Online: [ node1 ]

 ip (ocf::heartbeat:IPaddr):    Started node1
 mysql  (ocf::heartbeat:mysql): Started node1
```
Right now, your client can use the cluster IP (192.168.1.100) to connect to your MySQL. The client won’t realize which node it connected to. In this case, he/she will connect to Node 2 if both of them (the nodes) are online. If Node 2 is offline, 192.168.1.100 will automatically connect the client to MySQL in 192.168.1.101. If Node 1 is offline, 192.168.1.100 will automatically uses MySQL in Node 2 which is in 192.168.1.102.
To reonline Node 2, just use these commands in your Node 2:
```
crm
node
online
quit
```

However, usually you want to control which MySQL will be up first, either in Node 1 or in Node 2. To make this happen, you need to use colocation:

crm
configure
colocation ip-mysql inf: ip mysql
commit
quit

crm status would show something like this:

============
Last updated: Mon May 14 01:26:41 2012
Stack: openais
Current DC: node1 - partition with quorum
Version: 1.1.1-b9b672590e79770afb63b9b455400d92fb6b5d9e
2 Nodes configured, 2 expected votes
2 Resources configured.
============

Online: [ node1 node2 ]

 ip (ocf::heartbeat:IPaddr):    Started node1
 mysql  (ocf::heartbeat:mysql): Started node1

That means, your mysql has been started on Node 1. So, everytime corosync is started on both of the nodes, mysql will be started on Node 1 due to the colocation configuration.

Try turning off Node 1 or Node 2 and see how MySQL switches side from both of the nodes.

I think that’s it, next tutorial should be mainly about DRBD. Good luck!