Thursday, 4 April 2013

Minimize MySQL replication lag

Replication lag occurs in many MySQL deployments. This often happens due to slow insert/update/delete query that cause the slave to fall behind. However, it could also happen due to high volume of writes to the master database. One of my clients (who use MySQL 5.1) reported replication lag on a slave host, there was no stuck/long query running to cause slave to fall behind. And MySQL process-list show that it's reading/applying events from relay log (i.e. from local copy of master binary logs):

| 2 | system user | | NULL | Connect | 779612 | Waiting for master to send event | NULL |
| 3 | system user | | NULL | Connect | 4215   | Reading event from the relay log | NULL |
And the replication latency continue to grow:  "Seconds_Behind_Master: 4234"
When we look at MySQL dashboard to check on database activity graph (see below), its doing only avg 170 inserts/second. 

MySQL config for replication slave hosts can be configured to behave differently, the benefit is to help assist with replication catch-up. In this scenario we changed the value of  innodb_flush_log_at_trx_commit  to 2 from default 1. After this change, significant increase in db activity was observed, replication lag started to reduce sharply, and replication caught up with MySQL database under 5 mins, see below db activity graph:


  1. The correct setting is innodb_flush_log_at_trx_commit = 2