Saturday 23 June 2012

MySQL - capturing diagnostic data


Sporadic problems such as occasional server stalls can be difficult to diagnose. It is important to capture good diagnostic data when the problem is clearly happening. And it gets frustrating when you don't know when next it will hit your database. So, we need to find a way to know when problem  happens,Let's call it 'diagnostic trigger'. We need to find good indicator e.g. sudden sever's load spike and/or spike in status varaibles such as Threads_connected, Threads_running.
 Thanks to perconna Toolkit, pt-stalk - Watches for a trigger condition to become true, and then collects data to help in diagnosing problems. I performed few quick test using pt-stalk:
Installation:
At the time of writing this article percona-toolki2.1.2 is the latest release:
wget http://www.percona.com/downloads/percona-toolkit/2.1.2/percona-toolkit-2.1.2-1.noarch.rpm

You might have to install following required by percona-toolkit:
yum install perl-Time-HiRes.x86_64
yum install perl-TermReadKey.x86_64

Quick Examples:

A) Collect diagnostic data when at least 40 threads are running,


# pt-stalk  --function status --variable Threads_running \
--threshold 40 --prefix=mysql_trouble --prefix=mysql_trouble --cycles=2-- \
-uroot -pxxxx
....
2012_06_23_10_13_11 Check results: Threads_running=1, matched=no, cycles_true=0
2012_06_23_10_13_12 Check results: Threads_running=1, matched=no, cycles_true=0
2012_06_23_10_13_14 Check results: Threads_running=1, matched=no, cycles_true=0
2012_06_23_10_13_15 Check results: Threads_running=40, matched=yes, cycles_true=1
2012_06_23_10_13_16 Check results: Threads_running=41, matched=yes, cycles_true=2
2012_06_23_10_13_16 Collect triggered
2012_06_23_10_13_16 Collector PID 16113
2012_06_23_10_13_16 Sleeping 300 seconds after collect 
And the stored diagnostic data is available in its default location:
# ls -lh /var/lib/pt-stalk/
total 572K
-rw-r--r-- 1 root root 4.5K Jun 23 11:08 mysql_trouble-df
-rw-r--r-- 1 root root  153 Jun 23 11:08 mysql_trouble-disk-space
-rw-r--r-- 1 root root  16K Jun 23 11:08 mysql_trouble-diskstats
...
-rw-r--r-- 1 root root  15K Jun 23 11:08 mysql_trouble-top
-rw-r--r-- 1 root root  386 Jun 23 11:08 mysql_trouble-trigger
-rw-r--r-- 1 root root 8.1K Jun 23 11:08 mysql_trouble-variables
-rw-r--r-- 1 root root  992 Jun 23 11:08 mysql_trouble-vmstat
-rw-r--r-- 1 root root  245 Jun 23 11:08 mysql_trouble-vmstat-overall

B) Collect data when at least 20 queries running in the 'copying to tmp table' State.

pt-stalk  --function processlist --variable State \
--match Copying --threshold 20 --prefix=mysql_trouble --cycles=2 \
-- -uroot -pxxxxx
..
2012_06_23_11_54_19 Check results: State=21, matched=yes, cycles_true=1
2012_06_23_11_54_20 Check results: State=21, matched=yes, cycles_true=2
2012_06_23_11_54_20 Collect triggered
2012_06_23_11_54_21 Collector PID 9154
2012_06_23_11_54_21 Sleeping 300 seconds after collect 
 

You will probably like to run this tool as daemon - Here’s a sample configuration file for triggering when there are more than 40 threads running at once:

# Config for pt-stalk
variable=Threads_running
cycles=2  # trigger if problem seen twice in a row
dest=/var/log/mysql_diagnostics
threshold=40
daemonize
--
--user=monitoring
--password=xxxx
Syntax to run pt-stalk as daemon: # pt-stalk --config /etc/pt-stalk.cnf  For more information about configuration files click here

No comments:

Post a Comment