====== High availability with Heartbeat and DRBD on Debian ======
===== Author: Philipp Noack =====
Note: If you clone the first server after setup you will NOT get it running. The second server has to be installed exactly like the first one!
------------------------------------------------------------------------------------------------------------
1. Debian install :
My setup was:
/boot ext3 with 100 MB and boot-flag
/ ext3 with 5 GB
swap wi th4 GB (depending on memory)
/var ext3 with 5 GB
/var2 with the rest of the space was setup but wasn't formatted, yet!
2. Install bigmem-kernel (in case of +4GB memory)
type "apt-cache search linux-image bigmem" and choose the right kernel
3. Network config: /etc/network/interfaces (eth1 will be for DRBD (RAID over TCP/IP) / heartbeat)
auto lo eth0 eth1
iface lo inet loopback
iface eth0 inet static
address 172.30.86.???
netmask 255.255.255.0
network 172.30.86.0
broadcast 172.30.86.255
gateway 172.30.86.1
iface eth1 inet static
address 192.168.1.???
netmask 255.255.255.0
network 192.168.1.0
broadcast 192.168.1.255
4. Install heartbeat:
aptitude install heartbeat
5. Install DRBD:
Add Debain Backports for DRBD8 in sources.list (included since lenny)
deb http://www.backports.org/debian etch-backports main contrib non-free
Install packages : drbd8-source, drbd8-utils
aptitude -t etch-backports install drbd8-source
aptitude -t etch-backports install drbd8-utils
Create kernel module (has to be redone if the kernel will be updated in future)
module-assistant auto-install drbd8
reboot with the new kernel.
6. Edit the DRBD config (Official documentation: http://www.drbd.org/docs/install/). Here is my config as example :
Important: You will find the name of the var2 partition in /etc/fstab. Mine was /dev/cciss/c0d0p7.
global {
usage-count yes;
}
common {
syncer { rate 700000K; }
}
resource r0 {
protocol C;
handlers {
pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
local-io-error "echo o > /proc/sysrq-trigger ; halt -f";
outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5";
}
startup {
degr-wfc-timeout 120; # 2 minutes.
}
disk {
on-io-error detach;
no-disk-flushes;
}
net {
after-sb-0pri disconnect;
after-sb-1pri disconnect;
after-sb-2pri disconnect;
rr-conflict disconnect;
}
syncer {
al-extents 257;
}
on mbops01 {
device /dev/drbd0;
disk /dev/cciss/c0d0p6;
address 192.168.1.1:7788;
meta-disk internal;
}
on mbops02 {
device /dev/drbd0;
disk /dev/cciss/c0d0p6;
address 192.168.1.2:7788;
meta-disk internal;
}
}
Customize the rights for DRBD:
chgrp haclient /sbin/drbdsetup
chmod o-x /sbin/drbdsetup
chmod u+s /sbin/drbdsetup
chgrp haclient /sbin/drbdmeta
chmod o-x /sbin/drbdmeta
chmod u+s /sbin/drbdmeta
7. Initialize DRBD (on both machines):
drbdadm create-md r0
check it with "cat /proc/drbd"
Warning: Do this set on the master server ONLY!
drbdadm -- --overwrite-data-of-peer primary r0
8. Create filesystem /dev/drbd0 (only on the master server again):
mkfs -t ext3 /dev/drbd0
9. Install OPSView incl. apache2 (or see the official debian documentation under http://docs.opsview.org/doku.php?id=opsview2.14:debian-installation):
Add following lines to the sources.list:
deb http://apt.opsview.org/debian etch main
deb http://ftp.debian.org/debian etch non-free
Then do a "apt-get update" and "apt-get install opsview".
I just quote the original docu : "Once Opsview has been installed, a Catalyst web server should be listening on port 3000. The Apache web server can then be used as a proxy to make Opsview available on port 80 (http) ñ this also provides a significant improvement in performance as static content is then served directly by apache rather than via the perl Catalyst web server."
apt-get install libapache2-mod-proxy-html
a2enmod proxy
a2enmod proxy_http
a2enmod proxy_html
/etc/init.d/apache2 force-reload
10. Remove opsview + Mysql + Apache from the runlevels to start automatically at startup (heartbeat does it for us now):
update-rc.d -f opsview remove
update-rc.d -f opsview-web remove
update-rc.d -f mysql remove
update-rc.d -f apache2 remove
update-rc.d -f opsview-agent remove
11. Configure heartbeat: /etc/ha.d/ha.cf
debugfile /var/log/ha-debug
logfile /var/log/ha-log
keepalive 2
deadtime 30
warntime 10
initdead 120
auto_failback off
bcast eth1
# This is a ping test in our network to check which server can ping it
ping 172.30.86.4
node mbops01
node mbops02
respawn hacluster /usr/lib/heartbeat/ipfail
apiauth ipfail gid=haclient uid=hacluster
The file /etc/ha.d/haresources:
mbops01 drbddisk::r0 Filesystem::/dev/drbd0::/var2::ext3 172.30.86.170 mysql opsview opsview-web apache2
The file /etc/ha.d/authkeys:
auth 3
3 md5 anypassword
The set the filerights:
chmod 600 /etc/ha.d/authkeys
11. Moving data:
cd /usr/local/
tar cvzf nagios.tar.gz nagios
mv nagios.tar.gz /var2
rm -r nagios
cd /var2
tar xvzf nagios.tar.gz /var2
ln -s /var2/nagios /usr/local/nagios
same with /usr/local/opsview-web
same with /var/lib/mysql
12. Replace NRPE agents (to be done on both machines in primary mode)
The opsview-agent needs the var2 partition to run, so you need to use another NRPE agent.
Install nagios NRPE server and plugins
apt-get install nagios-nrpe-server nagios-plugins-basic
rm /etc/nagios/nrpe.cfg
cp /var2/nagios/etc/nrpe.cfg /etc/nagios/nrpe.cfg
Now you need to edit the paths in the nrpe.cfg, /usr/local/nagios/libexec is replaced by
/usr/lib/nagios/plugins
vim /etc/nagios/nrpe.cfg
/etc/init.d/nagios-nrpe-server restart
13. Solving problems
- Disk-flush errors on RAID systems
Add the line "no-disk-flushes;" into the drbd.conf:
resource r0
disk {
no-disk-flushes;
...
- Apache2 proxy doesn't work:
cp /usr/share/doc/opsview/apache2-proxy.conf /etc/apache2/sites-available/opsview
ln -s /etc/apache2/sites-enabled/opsview /etc/apache2/sites-available/opsview
Customize the config files (remove comments and customize IPs). Do this on both machines, then just do a takeover to restart apache2 (hearbeat).
- MySQL doesn't start/stop on a node:
/etc/mysql/debian.cnf passwords have to match
- Delete filesystem if there are problems with it
dd if=/dev/zero bs=1M count=1 of=/dev/cciss/????; sync
- Problems with ressources r1, r2 ...
Delete the line 'after "r2";' in the drbd.conf