Sporadic problems such as occasional server stalls can be difficult to diagnose. It is important to capture good diagnostic data when the problem is clearly happening. And it gets frustrating when you don't know when next it will hit your database. So, we need to find a way to know when problem happens,Let's call it 'diagnostic trigger'. We need to find good indicator e.g. sudden sever's load spike and/or spike in status varaibles such as Threads_connected, Threads_running.
Thanks to perconna Toolkit, pt-stalk - Watches for a trigger condition to become true, and then collects data to help in diagnosing problems. I performed few quick test using pt-stalk:
Installation:
At the time of writing this article percona-toolki2.1.2 is the latest release:
wget http://www.percona.com/downloads/percona-toolkit/2.1.2/percona-toolkit-2.1.2-1.noarch.rpm
You might have to install following required by percona-toolkit:
yum install perl-Time-HiRes.x86_64
yum install perl-TermReadKey.x86_64
Quick Examples:
A) Collect diagnostic data when at least 40 threads are running,
# pt-stalk --function status --variable Threads_running \
--threshold 40 --prefix=mysql_trouble --prefix=mysql_trouble --cycles=2-- \
-uroot -pxxxx
....
2012_06_23_10_13_11 Check results: Threads_running=1, matched=no, cycles_true=0
2012_06_23_10_13_12 Check results: Threads_running=1, matched=no, cycles_true=0
2012_06_23_10_13_14 Check results: Threads_running=1, matched=no, cycles_true=0
2012_06_23_10_13_15 Check results: Threads_running=40, matched=yes, cycles_true=1
2012_06_23_10_13_16 Check results: Threads_running=41, matched=yes, cycles_true=2
2012_06_23_10_13_16 Collect triggered
2012_06_23_10_13_16 Collector PID 16113
2012_06_23_10_13_16 Sleeping 300 seconds after collect
And the stored diagnostic data is available in its default location:
# ls -lh /var/lib/pt-stalk/ total 572K -rw-r--r-- 1 root root 4.5K Jun 23 11:08 mysql_trouble-df -rw-r--r-- 1 root root 153 Jun 23 11:08 mysql_trouble-disk-space -rw-r--r-- 1 root root 16K Jun 23 11:08 mysql_trouble-diskstats ... -rw-r--r-- 1 root root 15K Jun 23 11:08 mysql_trouble-top -rw-r--r-- 1 root root 386 Jun 23 11:08 mysql_trouble-trigger -rw-r--r-- 1 root root 8.1K Jun 23 11:08 mysql_trouble-variables -rw-r--r-- 1 root root 992 Jun 23 11:08 mysql_trouble-vmstat -rw-r--r-- 1 root root 245 Jun 23 11:08 mysql_trouble-vmstat-overall
B) Collect data when at least 20 queries running in the 'copying to tmp table' State.
pt-stalk --function processlist --variable State \
--match Copying --threshold 20 --prefix=mysql_trouble --cycles=2 \
-- -uroot -pxxxxx
..
2012_06_23_11_54_19 Check results: State=21, matched=yes, cycles_true=1
2012_06_23_11_54_20 Check results: State=21, matched=yes, cycles_true=2
2012_06_23_11_54_20 Collect triggered
2012_06_23_11_54_21 Collector PID 9154
2012_06_23_11_54_21 Sleeping 300 seconds after collect
You will probably like to run this tool as daemon - Here’s a sample configuration file for triggering when there are more than 40 threads running at once:
# Config for pt-stalk variable=Threads_running cycles=2 # trigger if problem seen twice in a row dest=/var/log/mysql_diagnostics threshold=40 daemonize -- --user=monitoring --password=xxxxSyntax to run pt-stalk as daemon:# pt-stalk --config /etc/pt-stalk.cnf
For more information about configuration files click here
No comments:
Post a Comment