2009年4月8日星期三

heartbeat+DRBD+MySQL(三)


登陆node-a机器


[root@node-a ha.d]# /etc/rc.d/init.d/heartbeat start
Starting High-Availability services:
[ OK ]
[root@node-a ha.d]# chkconfig heartbeat on


[root@node-a ~]# ip addr show eth0 <-- 虚拟ip 192.168.1.10已经启用
2: eth0: mtu 1500 qdisc pfifo_fast qlen 1000
link/ether 00:0c:29:8c:0c:be brd ff:ff:ff:ff:ff:ff
inet 192.168.1.11/24 brd 192.168.1.255 scope global eth0
inet 192.168.1.10/24 brd 192.168.1.255 scope global secondary eth0
inet6 fe80::20c:29ff:fe8c:cbe/64 scope link
valid_lft forever preferred_lft forever

[root@node-a ~]# cd /mnt/mysql
[root@node-a mysql]# ls
ibdata1 ib_logfile0 ib_logfile1 lost+found mysql test

[root@node-a ~] tail -f /var/log/ha-debug <-- 日志可以看出来 设定资源都正常启动了
IPaddr2[2296][2331]: 2009/03/31_15:18:10 INFO: ip -f inet addr add 192.168.1.10/24 brd 192.168.1.255 dev eth0
IPaddr2[2296][2333]: 2009/03/31_15:18:10 INFO: ip link set eth0 up
IPaddr2[2296][2335]: 2009/03/31_15:18:11 INFO: /usr/lib/heartbeat/send_arp -i 200 -r 5 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.1.10 eth0 192.168.1.10 auto not_used not_used
IPaddr2[2267][2338]: 2009/03/31_15:18:11 INFO: Success
Filesystem[2456][2486]: 2009/03/31_15:18:13 INFO: Running start for /dev/drbd0 on /mnt/mysql
Filesystem[2456][2491]: 2009/03/31_15:18:13 INFO: Starting filesystem check on /dev/drbd0
Filesystem[2445][2502]: 2009/03/31_15:18:13 INFO: Success
ResourceManager[2171][2503]: 2009/03/31_15:18:13 debug: /etc/ha.d/resource.d/Filesystem /dev/drbd0 /mnt/mysql start done. RC=0
ResourceManager[2171][2542]: 2009/03/31_15:18:14 info: Running /etc/init.d/mysqld start
ResourceManager[2171][2543]: 2009/03/31_15:18:14 debug: Starting /etc/init.d/mysqld start
ResourceManager[2171][2652]: 2009/03/31_15:18:17 debug: /etc/init.d/mysqld start done. RC=0
heartbeat[2009]: 2009/03/31_15:18:18 info: Local Resource acquisition completed. (none)
heartbeat[2009]: 2009/03/31_15:18:18 info: local resource transition completed.

node-a和node-b机器执行 为了方便测试,开放数据库可以用192.168.1段的ip来连接
GRANT ALL PRIVILEGES ON *.* TO 'root'@'192.168.1.%';

现在你在192.168.1.1机器测试访问数据库
web01# mysql -h 192.168.1.10 -u root -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.0.45 Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql>

说明可以使用了。


最后我们整体来测试一下当机器出现问题后的是否能正常接管
1.测试机器直接关机

目前的状态是 node-b是主要机器 node-a是次要机器,两机器现在正常

node-b 执行关机 shutdown -h now
node-a 机器日志显示
heartbeat[6911]: 2009/04/05_15:13:46 info: all HA resource acquisition completed (standby).
heartbeat[5513]: 2009/04/05_15:13:46 info: Standby resource acquisition done [all].
heartbeat[7503]: 2009/04/05_15:13:46 debug: notify_world: setting SIGCHLD Handler to SIG_DFL
harc[7503][7509]: 2009/04/05_15:13:46 info: Running /etc/ha.d/rc.d/status status
mach_down[7515][7536]: 2009/04/05_15:13:46 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down[7515][7540]: 2009/04/05_15:13:46 info: mach_down takeover complete for node node-b.
heartbeat[5513]: 2009/04/05_15:13:46 info: mach_down takeover complete.
heartbeat[7541]: 2009/04/05_15:13:46 debug: notify_world: setting SIGCHLD Handler to SIG_DFL
harc[7541][7547]: 2009/04/05_15:13:46 info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp
ip-request-resp[7541][7553]: 2009/04/05_15:13:46 received ip-request-resp IPaddr2::192.168.1.10/24/eth0/192.168.1.255 OK yes
ResourceManager[7554][7565]: 2009/04/05_15:13:46 info: Acquiring resource group: node-a IPaddr2::192.168.1.10/24/eth0/192.168.1.255 drbddisk::mysql Filesystem::/dev/drbd0::/mnt/mysql mysqld
IPaddr2[7577][7634]: 2009/04/05_15:13:47 INFO: Running OK
Filesystem[7660][7704]: 2009/04/05_15:13:48 INFO: Running OK

[root@node-a ~]# netstat -anpt
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTEN 7481/mysqld
tcp 0 0 192.168.100.11:7789 0.0.0.0:* LISTEN -
tcp 0 0 :::22 :::* LISTEN 1742/sshd
tcp 0 148 ::ffff:192.168.1.11:22 ::ffff:192.168.1.210:3542 ESTABLISHED 1869/0

[root@node-a ~]# /etc/rc.d/init.d/drbd status
drbd driver loaded OK; device status:
version: 8.2.6 (api:88/proto:86-88)
GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by buildsvn@c5-i386-build, 2008-10-03 11:42:32
m:res cs st ds p mounted fstype
0:mysql WFConnection Primary/Unknown UpToDate/DUnknown C /mnt/mysql ext3

node-a机器发现node-b机器死了 node-a成功接管node-b为master


再重启node-b机器
node-a机器还是为master 并没有发现脑裂


接着我们
node-a 关机shutdown -h now

node-b机器显示 node-b成功接管node-a为master
Apr 5 15:22:13 node-b kernel: drbd0: role( Secondary -> Primary )
Apr 5 15:22:13 node-b kernel: drbd0: Writing meta data super block now.
Apr 5 15:22:14 node-b Filesystem[2346]: [2390]: INFO: Resource is stopped
Apr 5 15:22:14 node-b ResourceManager[2054]: [2404]: info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /mnt/mysql start
Apr 5 15:22:14 node-b Filesystem[2417]: [2447]: INFO: Running start for /dev/drbd0 on /mnt/mysql
Apr 5 15:22:14 node-b Filesystem[2417]: [2452]: INFO: Starting filesystem check on /dev/drbd0
Apr 5 15:22:15 node-b kernel: kjournald starting. Commit interval 5 seconds
Apr 5 15:22:15 node-b kernel: EXT3 FS on drbd0, internal journal
Apr 5 15:22:15 node-b kernel: EXT3-fs: mounted filesystem with ordered data mode.
Apr 5 15:22:15 node-b Filesystem[2406]: [2463]: INFO: Success
Apr 5 15:22:15 node-b ResourceManager[2054]: [2503]: info: Running /etc/init.d/mysqld start
Apr 5 15:22:18 node-b heartbeat: [2022]: info: all HA resource acquisition completed (standby).
Apr 5 15:22:18 node-b heartbeat: [1907]: info: Standby resource acquisition done [all].
Apr 5 15:22:18 node-b harc[2614]: [2620]: info: Running /etc/ha.d/rc.d/status status
Apr 5 15:22:19 node-b mach_down[2626]: [2647]: info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
Apr 5 15:22:19 node-b mach_down[2626]: [2651]: info: mach_down takeover complete for node node-a.
Apr 5 15:22:19 node-b heartbeat: [1907]: info: mach_down takeover complete.
Apr 5 15:22:19 node-b harc[2652]: [2658]: info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp
Apr 5 15:22:19 node-b ip-request-resp[2652]: [2664]: received ip-request-resp IPaddr2::192.168.1.10/24/eth0/192.168.1.255 OK yes
Apr 5 15:22:19 node-b ResourceManager[2665]: [2676]: info: Acquiring resource group: node-b IPaddr2::192.168.1.10/24/eth0/192.168.1.255 drbddisk::mysql Filesystem::/dev/drbd0::/mnt/mysql mysqld
Apr 5 15:22:20 node-b IPaddr2[2688]: [2745]: INFO: Running OK

[root@node-b ~]# netstat -anpt
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTEN 2592/mysqld
tcp 0 0 192.168.100.12:7789 0.0.0.0:* LISTEN -
tcp 0 0 192.168.100.12:59833 192.168.100.11:7789 TIME_WAIT -
tcp 0 0 :::22 :::* LISTEN 1719/sshd
tcp 0 0 ::ffff:192.168.1.12:22 ::ffff:192.168.1.210:3682 ESTABLISHED 2003/0



再重启node-a机器
node-b机器还是为master 并没有发现脑裂



2. 测试网卡失效
目前node-a 为主要机器 node-b为次要机器

[root@node-a ~]# ifdown eth0 <-- 将node-a 的eth0网卡失效

在node-b机器的日志显示
[root@node-b ~]# tail -f /var/log/messages
Apr 8 16:10:23 node-b heartbeat: [1902]: info: Link node-a:eth0 dead.<-- 发现eth0死
Apr 8 16:10:24 node-b ipfail: [1994]: info: Telling other node that we have more visible ping nodes.
Apr 8 16:10:24 node-b ipfail: [1994]: info: Link Status update: Link node-a/eth0 now has status dead
Apr 8 16:10:26 node-b ipfail: [1994]: info: Asking other side for ping node count.
Apr 8 16:10:26 node-b ipfail: [1994]: info: Checking remote count of ping nodes.
Apr 8 16:10:27 node-b ipfail: [1994]: info: Telling other node that we have more visible ping nodes.
Apr 8 16:10:32 node-b heartbeat: [1902]: info: node-a wants to go standby [all]
<-- 将node-a 变更为 standby
Apr 8 16:10:36 node-b kernel: drbd0: peer( Primary -> Secondary )
<-- 间node-a 变更为 次要机器( Primary -> Secondary )
Apr 8 16:10:36 node-b heartbeat: [1902]: info: standby: acquire [all] resources from node-a
Apr 8 16:10:36 node-b heartbeat: [3022]: info: acquire all HA resources (standby).
Apr 8 16:10:37 node-b ResourceManager[3035]: [3046]: info: Acquiring resource group: node-b IPaddr2::192.168.1.10/24/eth0/192.168.1.255 drbddisk::mysql Filesystem::/dev/drbd0::/mnt/mysql mysqld
Apr 8 16:10:37 node-b IPaddr2[3058]: [3115]: INFO: Resource is stopped
Apr 8 16:10:37 node-b ResourceManager[3035]: [3129]: info: Running /etc/ha.d/resource.d/IPaddr2 192.168.1.10/24/eth0/192.168.1.255 start
Apr 8 16:10:37 node-b IPaddr2[3160]: [3195]: INFO: ip -f inet addr add 192.168.1.10/24 brd 192.168.1.255 dev eth0
Apr 8 16:10:37 node-b IPaddr2[3160]: [3197]: INFO: ip link set eth0 up
Apr 8 16:10:37 node-b IPaddr2[3160]: [3199]: INFO: /usr/lib/heartbeat/send_arp -i 200 -r 5 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.1.10 eth0 192.168.1.10 auto not_used not_used
Apr 8 16:10:37 node-b IPaddr2[3131]: [3203]: INFO: Success
Apr 8 16:10:37 node-b ResourceManager[3035]: [3232]: info: Running /etc/ha.d/resource.d/drbddisk mysql start
Apr 8 16:10:37 node-b kernel: drbd0: role( Secondary -> Primary )
Apr 8 16:10:37 node-b kernel: drbd0: Writing meta data super block now.
Apr 8 16:10:38 node-b Filesystem[3249]: [3293]: INFO: Resource is stopped
Apr 8 16:10:38 node-b ResourceManager[3035]: [3307]: info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /mnt/mysql start
Apr 8 16:10:38 node-b Filesystem[3320]: [3350]: INFO: Running start for /dev/drbd0 on /mnt/mysql
Apr 8 16:10:38 node-b Filesystem[3320]: [3355]: INFO: Starting filesystem check on /dev/drbd0
Apr 8 16:10:38 node-b kernel: kjournald starting. Commit interval 5 seconds
Apr 8 16:10:38 node-b kernel: EXT3 FS on drbd0, internal journal
Apr 8 16:10:38 node-b kernel: EXT3-fs: mounted filesystem with ordered data mode.
Apr 8 16:10:38 node-b Filesystem[3309]: [3366]: INFO: Success
Apr 8 16:10:38 node-b ResourceManager[3035]: [3406]: info: Running /etc/init.d/mysqld start

[root@node-b ~]# ip addr show eth0
2: eth0: mtu 1500 qdisc pfifo_fast qlen 1000
link/ether 00:0c:29:41:a0:8e brd ff:ff:ff:ff:ff:ff
inet 192.168.1.12/24 brd 192.168.1.255 scope global eth0
inet 192.168.1.10/24 brd 192.168.1.255 scope global secondary eth0
inet6 fe80::20c:29ff:fe41:a08e/64 scope link
valid_lft forever preferred_lft forever

说明node-b机器 已经接管node-a为master

当node-a 机器恢复eth0

ifup eth0

在node-b机器日志显示 侦测到node-a的eth0已经恢复
Apr 8 16:15:11 node-b heartbeat: [1902]: info: Link node-a:eth0 up.
Apr 8 16:15:11 node-b ipfail: [1994]: info: Link Status update: Link node-a/eth0 now has status up
Apr 8 16:15:14 node-b ipfail: [1994]: info: Ping node count is balanced.

node-b 还是 master机器


3.测试heartbeat服务死掉


[root@node-b ~]# /etc/rc.d/init.d/heartbeat stop
Stopping High-Availability services:
[ OK ]

在node-a机器的日志上显示
[root@node-a ~]# tail -f /var/log/ha-debug
ipfail[2021]: 2009/04/08_16:14:00 debug: Other side is unstable.
heartbeat[1929]: 2009/04/08_16:14:04 info: Received shutdown notice from 'node-b'.
heartbeat[1929]: 2009/04/08_16:14:04 info: Resources being acquired from node-b.
heartbeat[1929]: 2009/04/08_16:14:04 debug: StartNextRemoteRscReq(): child count 1
heartbeat[3311]: 2009/04/08_16:14:04 info: acquire all HA resources (standby).
ResourceManager[3340][3357]: 2009/04/08_16:14:04 info: Acquiring resource group: node-a IPaddr2::192.168.1.10/24/eth0/192.168.1.255 drbddisk::mysql Filesystem::/dev/drbd0::/mnt/mysql mysqld
IPaddr2[3381][3495]: 2009/04/08_16:14:06 INFO: Resource is stopped
IPaddr2[3389][3503]: 2009/04/08_16:14:06 INFO: Resource is stopped
heartbeat[3312]: 2009/04/08_16:14:06 info: Local Resource acquisition completed.
heartbeat[1929]: 2009/04/08_16:14:06 debug: StartNextRemoteRscReq(): child count 2
heartbeat[1929]: 2009/04/08_16:14:06 debug: StartNextRemoteRscReq(): child count 1
ResourceManager[3340][3515]: 2009/04/08_16:14:06 info: Running /etc/ha.d/resource.d/IPaddr2 192.168.1.10/24/eth0/192.168.1.255 start
ResourceManager[3340][3516]: 2009/04/08_16:14:06 debug: Starting /etc/ha.d/resource.d/IPaddr2 192.168.1.10/24/eth0/192.168.1.255 start
IPaddr2[3546][3581]: 2009/04/08_16:14:07 INFO: ip -f inet addr add 192.168.1.10/24 brd 192.168.1.255 dev eth0
IPaddr2[3546][3583]: 2009/04/08_16:14:07 INFO: ip link set eth0 up
IPaddr2[3546][3585]: 2009/04/08_16:14:07 INFO: /usr/lib/heartbeat/send_arp -i 200 -r 5 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.1.10 eth0 192.168.1.10 auto not_used not_used
IPaddr2[3517][3589]: 2009/04/08_16:14:07 INFO: Success
ResourceManager[3340][3590]: 2009/04/08_16:14:07 debug: /etc/ha.d/resource.d/IPaddr2 192.168.1.10/24/eth0/192.168.1.255 start done. RC=0
ResourceManager[3340][3618]: 2009/04/08_16:14:07 info: Running /etc/ha.d/resource.d/drbddisk mysql start
ResourceManager[3340][3619]: 2009/04/08_16:14:07 debug: Starting /etc/ha.d/resource.d/drbddisk mysql start
ResourceManager[3340][3623]: 2009/04/08_16:14:07 debug: /etc/ha.d/resource.d/drbddisk mysql start done. RC=0
Filesystem[3635][3679]: 2009/04/08_16:14:08 INFO: Resource is stopped
ResourceManager[3340][3693]: 2009/04/08_16:14:08 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /mnt/mysql start
ResourceManager[3340][3694]: 2009/04/08_16:14:08 debug: Starting /etc/ha.d/resource.d/Filesystem /dev/drbd0 /mnt/mysql start
Filesystem[3706][3736]: 2009/04/08_16:14:09 INFO: Running start for /dev/drbd0 on /mnt/mysql
Filesystem[3706][3741]: 2009/04/08_16:14:09 INFO: Starting filesystem check on /dev/drbd0
Filesystem[3695][3752]: 2009/04/08_16:14:09 INFO: Success


[root@node-a ~]# ip addr show eth0
2: eth0: mtu 1500 qdisc pfifo_fast qlen 1000
link/ether 00:0c:29:8c:0c:be brd ff:ff:ff:ff:ff:ff
inet 192.168.1.11/24 brd 192.168.1.255 scope global eth0
inet 192.168.1.10/24 brd 192.168.1.255 scope global secondary eth0
inet6 fe80::20c:29ff:fe8c:cbe/64 scope link
valid_lft forever preferred_lft forever

[root@node-a ~]# netstat -anpt
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTEN 3881/mysqld
tcp 0 0 192.168.100.11:49788 192.168.100.12:7789 ESTABLISHED -
tcp 0 0 192.168.100.11:7789 192.168.100.12:53825 ESTABLISHED -
tcp 0 0 :::22 :::* LISTEN 1741/sshd
tcp 0 0 ::ffff:192.168.1.11:22 ::ffff:192.168.1.211:1285 ESTABLISHED 2037/0

node-a机器已经接管了node-b机器为master


[root@node-b ~]# /etc/rc.d/init.d/heartbeat start <-- node-b再启动 heartbeat
Starting High-Availability services:
2009/04/08_16:20:42 INFO: Resource is stopped
[ OK ]
node-a机器日志显示 侦测出node-b的heartbeat已经启动
[root@node-a ~]# tail -f /var/log/ha-debug
heartbeat[1929]: 2009/04/08_16:16:35 info: Heartbeat restart on node node-b
heartbeat[1929]: 2009/04/08_16:16:35 info: Link node-b:eth0 up.
heartbeat[1929]: 2009/04/08_16:16:35 info: Status update for node node-b: status init
heartbeat[1929]: 2009/04/08_16:16:35 info: Link node-b:eth1 up.
heartbeat[1929]: 2009/04/08_16:16:35 info: Status update for node node-b: status up

此时 node-a机器任然为master
服务一切正常,到此所有测试已经完毕!


主意事项:
资源启动是从上到下依次进行,而关闭资源是从下到上依次进行 比如:


IPaddress::192.168.12.30/24 - Runs /etc/ha.d/resources.d/IPaddress 192.168.12.30/24 {start,stop}
drbddsk::mysql - Runs /etc/ha.d/resources.d/drbddsk mysql {start,stop}
Filesystem::/dev/drbd0::/mnt/mysql::ext3::defaults - Runs /etc/ha.d/resources.d/Filesystem /dev/drbd0 /mnt/mysql ext3 defaults {start,stop}
mysqld - Runs mysqld {start,stop}

0 评论: