This applies to a replication architecture like the one detailed here Improving Replication Performance.
My situation was Master2 (which runs the blackhole engine) ran out of disk space and the current binlog was lost. After freeing up disk space on Master2 and restarting its replication it was now missing hours worth of binlog that its slaves are expecting to find. So to remedy this, I pointed the slaves directly back to Master1 to slurp up the missing binlog and then switch them back to Master2 once they were caught up.
========
Login to one of the replication slaves and make sure replication is stopped
mysqladmin stop
Review the final statements processed in the relay log
mysqlbinlog db00-relay-bin.000178 | tail -n 20
# at 51573639 #100706 18:58:13 server id 8 end_log_pos 110500816 Query thread_id=72390897 exec_time=19 error_code=0 SET TIMESTAMP=1278442693/*!*/; UPDATE tableA SET views=views+1 WHERE id='123' /*!*/; # at 51573817 #100706 18:58:13 server id 8 end_log_pos 110500876 Query thread_id=72390897 exec_time=19 error_code=0 SET TIMESTAMP=1278442693/*!*/; COMMIT /*!*/;
Login to Master1 and find the same block of statements in its binlog
mysqlbinlog master-bin.00135 | less
I like to search via the timestamp in line 3 above.
# at 975963380 #100706 18:58:13 server id 8 end_log_pos 975963462 Query thread_id=723908 97 exec_time=0 error_code=0 SET TIMESTAMP=1278442693/*!*/; BEGIN /*!*/; # at 975963462 #100706 18:58:13 server id 8 end_log_pos 975963640 Query thread_id=723908 97 exec_time=0 error_code=0 SET TIMESTAMP=1278442693/*!*/; UPDATE tableA SET views=views+1 WHERE id='123' /*!*/; # at 975963640 #100706 18:58:13 server id 8 end_log_pos 975963667 Xid = 2030221535 COMMIT/*!*/;
On the slave, change it’s master to the actual master using the “end_log_pos” of the final statement
change master to master_host='10.0.1.10', \ master_user='replicant', \ master_password='s3cret', \ master_log_file='mysql-bin.000135', \ master_log_pos=975963667; start slave;
Wait for it to catch up
show slave status\G
Stop replication from Master1
stop slave;
Again, look up its final statement
mysqlbinlog db00-relay-bin.000178 | tail -n 20
# at 52117220 #100706 19:03:41 server id 8 end_log_pos 111044278 Query thread_id=72393308 exec_time=20 error_code=0 SET TIMESTAMP=1278443021/*!*/; BEGIN /*!*/; # at 52117279 #100706 19:03:41 server id 8 end_log_pos 111044428 Query thread_id=72393308 exec_time=20 error_code=0 SET TIMESTAMP=1278443021/*!*/; UPDATE tableA SET views=views+1 WHERE id='5647' /*!*/; # at 52117429 #100706 19:03:41 server id 8 end_log_pos 111044488 Query thread_id=72393308 exec_time=20 error_code=0 SET TIMESTAMP=1278443021/*!*/; COMMIT /*!*/;
This time locate its location in Master2’s binlog
mysqlbinlog blackhole-bin.000006 | less
# at 111044219 #100706 19:03:41 server id 8 end_log_pos 111044278 Query thread_id=72393308 exec_time=20 error_code=0 SET TIMESTAMP=1278443021/*!*/; BEGIN /*!*/; # at 111044278 #100706 19:03:41 server id 8 end_log_pos 111044428 Query thread_id=72393308 exec_time=20 error_code=0 SET TIMESTAMP=1278443021/*!*/; UPDATE tableA SET views=views+1 WHERE id='5647' /*!*/; # at 111044428 #100706 19:03:41 server id 8 end_log_pos 111044488 Query thread_id=72393308 exec_time=20 error_code=0 SET TIMESTAMP=1278443021/*!*/; COMMIT /*!*/;
Now change the slave’s master back to Master2
change master to master_host='10.0.1.20', \ master_user='replicant', \ master_password='s3cret', \ master_log_file='blackhole-bin.000006', \ master_log_pos=111044488; start slave; show slave status\G
Will do a follow up post with how to verify the slave’s have up-to-date and accurate data.
Pingback: Tweets that mention Recovering from catastrophic failure of the replication master « Dctr Watson -- Topsy.com()
Pingback: A Coffee Table Book…about Coffee Tables! | tea house()