Monday, 23 July 2012

Innodb purging stalls

One of my clients reported an incident where they are getting 'Table Full' error against Innodb table. However, there seem to be ample free Innodb space available. So, when I looked at host server, I found they are running old mysql 5.0 release on 32bit platform, Innodb monitor output show:

History list length 2051529 
Main thread process no. 9701, id 1111792528, state: sleeping

Oh Innodb purge thread isn't running, and there is no change in "history length".
Note: Purge thread never progresses when the main thread is in a 'sleep' state.

Related BUG 13847885. Quoting:
 Deleting a huge amount of data from InnoDB tables within a short time could cause the purge operation that removes delete-marked records to stall. This issue could result in unnecessary disk space use, but does not cause any problems with data integrity. If this issue causes a disk space shortage, restart the server to work around it. This issue is only likely to occur on 32-bit platforms
Bug fixed in 5.0.97, 5.1.63, 5.5.23

Solution: Restart MySQL and upgrade MySQL version

Saturday, 7 July 2012

MySQL - working with xml data

MySQL (5.1 and onward) provide XML functions; such as ExtractValue() that can be helpful to read required elements from xml; instead of returning entire xml back to the client application we can return one (or more) xml elements containing required data.
ExtractValue(xml_fragment, xpath_expression)
In the following example, we use ExtractValue() with column content as the first argument. For the second argument, we employ an XPath expression that means “the content of the element having the name attribute "bankName"”:
SELECT ExtractValue(content, '//variable[@name="bankName"][1]') "bankName" from xml_test where id=2;  
 | bankName |  
 | xyz      |  
 1 row in set (0.00 sec)  

 We can read more than one attribute:

SELECT ExtractValue(content, '//variable[@name="bankName"][1]') "bankName", ExtractValue(content, '//variable[@name="productType"][1]') "productType" from xml_test where id=1;  
 | bankName | productType |  
 | xyz      | cvv=***     |  
 1 row in set (0.00 sec)  
Similarly we can extract all attributes of a particular xml element 
 SELECT ExtractValue(content,'//dictionary[@name="client_b"]/*') as "all attributes" from xml_test where id=2;  
 | all attributes |  
 | abc cvv=***    |  
 1 row in set (0.00 sec)  

Saturday, 23 June 2012

MySQL - capturing diagnostic data

Sporadic problems such as occasional server stalls can be difficult to diagnose. It is important to capture good diagnostic data when the problem is clearly happening. And it gets frustrating when you don't know when next it will hit your database. So, we need to find a way to know when problem  happens,Let's call it 'diagnostic trigger'. We need to find good indicator e.g. sudden sever's load spike and/or spike in status varaibles such as Threads_connected, Threads_running.
 Thanks to perconna Toolkit, pt-stalk - Watches for a trigger condition to become true, and then collects data to help in diagnosing problems. I performed few quick test using pt-stalk:
At the time of writing this article percona-toolki2.1.2 is the latest release:

You might have to install following required by percona-toolkit:
yum install perl-Time-HiRes.x86_64
yum install perl-TermReadKey.x86_64

Quick Examples:

A) Collect diagnostic data when at least 40 threads are running,

# pt-stalk  --function status --variable Threads_running \
--threshold 40 --prefix=mysql_trouble --prefix=mysql_trouble --cycles=2-- \
-uroot -pxxxx
2012_06_23_10_13_11 Check results: Threads_running=1, matched=no, cycles_true=0
2012_06_23_10_13_12 Check results: Threads_running=1, matched=no, cycles_true=0
2012_06_23_10_13_14 Check results: Threads_running=1, matched=no, cycles_true=0
2012_06_23_10_13_15 Check results: Threads_running=40, matched=yes, cycles_true=1
2012_06_23_10_13_16 Check results: Threads_running=41, matched=yes, cycles_true=2
2012_06_23_10_13_16 Collect triggered
2012_06_23_10_13_16 Collector PID 16113
2012_06_23_10_13_16 Sleeping 300 seconds after collect 
And the stored diagnostic data is available in its default location:
# ls -lh /var/lib/pt-stalk/
total 572K
-rw-r--r-- 1 root root 4.5K Jun 23 11:08 mysql_trouble-df
-rw-r--r-- 1 root root  153 Jun 23 11:08 mysql_trouble-disk-space
-rw-r--r-- 1 root root  16K Jun 23 11:08 mysql_trouble-diskstats
-rw-r--r-- 1 root root  15K Jun 23 11:08 mysql_trouble-top
-rw-r--r-- 1 root root  386 Jun 23 11:08 mysql_trouble-trigger
-rw-r--r-- 1 root root 8.1K Jun 23 11:08 mysql_trouble-variables
-rw-r--r-- 1 root root  992 Jun 23 11:08 mysql_trouble-vmstat
-rw-r--r-- 1 root root  245 Jun 23 11:08 mysql_trouble-vmstat-overall

B) Collect data when at least 20 queries running in the 'copying to tmp table' State.

pt-stalk  --function processlist --variable State \
--match Copying --threshold 20 --prefix=mysql_trouble --cycles=2 \
-- -uroot -pxxxxx
2012_06_23_11_54_19 Check results: State=21, matched=yes, cycles_true=1
2012_06_23_11_54_20 Check results: State=21, matched=yes, cycles_true=2
2012_06_23_11_54_20 Collect triggered
2012_06_23_11_54_21 Collector PID 9154
2012_06_23_11_54_21 Sleeping 300 seconds after collect 

You will probably like to run this tool as daemon - Here’s a sample configuration file for triggering when there are more than 40 threads running at once:

# Config for pt-stalk
cycles=2  # trigger if problem seen twice in a row
Syntax to run pt-stalk as daemon: # pt-stalk --config /etc/pt-stalk.cnf  For more information about configuration files click here

Saturday, 16 June 2012

MySQL - How to scale inserts

Usually we expect inserts with large number of threads would to be faster but this is not always ture. The bottleneck might be the CPU, IO controller ...  MySQL can not fully use available cores/cpus e.g. MySQL 5.0 (Innodb) is limited to 4 cores etc. One of my clients had a problem scaling inserts, they have two data processing clusters each of which use 40 threads - so total 80 threads insert data into MySQL database (version 5.0.51). The issue, inserts are delayed by minute(s) and the backlog continue to grow... After examining innodb monitor status we found long list of transactions waiting for AUTO-INC lock: For example:

------- TRX HAS BEEN WAITING 9 SEC FOR THIS LOCK TO BE GRANTED: TABLE LOCK table `alertxxxx/alertdata` trx id 0 685590474 lock mode AUTO-INC waiting ------------------ ---TRANSACTION 0 685590465, ACTIVE 10 sec, process no 8457, OS thread id 1169045824 setting auto-inc lock

Why AUTO-INC lock?  - When accessing the auto-increment counter, InnoDB uses a special table-level AUTO-INC lock that it keeps to the end of the current SQL statement, not to the end of the transaction. This basically cause all inserts into the same table to serialize. With single row inserts it is normally not too bad but could prevent scalability with multiple threads inserting Bug #16979. However, you can reach better through put (inserts per second) with less threads. So after dropping number of threads on both clusters by 50% initially - taking it to 20-20 sessions. The problem almost disappeared and when we further reduced number of threads to 10-10 sessions, the problem disappeared!

Beginning with MySQL 5.1.22 - new locking model introduced for handling Innodb auto-increment in InnoDB. There is a good article which talks about this here and here

Similarly, if you want to achieve fast insert performance, it can be interesting to load files instead of the loading the inserts one by one : it is 3 to 4 times faster. If the goal is a huge amount of data already known at that time, it is probably the best option.


1. Optimize database structure -  Design your tables to minimize their space on the disk. This can result in huge improvements by reducing the amount of data written to and read from disk. Smaller tables normally require less main memory while their contents are being actively processed during query execution. Any space reduction for table data also results in smaller indexes that can be processed faster. Ensure columns have the right data types and require least amount of storage, you should try to use the most precise type in all cases. For example, if an integer column is used for values in the range from 1 to 99999, MEDIUMINT UNSIGNED is the best type. For more information about optimizing database structure click here. If you store large strings (TEXT) or BLOB, compression may help there is an article about it here 

2. Innodb flush method - e.g. O_DIRECT, if used can help to avoid double buffering between the InnoDB buffer pool and the operating system's filesystem cache. MySQL reference manual explain this here.
[ Warning] O_DIRECT, serializes the writes in ext3. Howerver, impact can be lowered by using innodb_file_per_table)

3. Innodb thread concurrency - Keep the value low for this option variable (default 8 ok), however the correct value for this variable is dependent on environment and workload. This option variable is explained here

4. Innodb buffer pool - Innodb maintains a buffer pool for caching data and indexes in memory. Making the pool larger can improve performance by reducing the amount of disk I/O needed, here is the good article that explains here and here.

5. Innodb log file size- The larger the value, the less checkpoint flush activity is needed in the buffer pool, saving disk I/O. But larger log files also mean that recovery is slower in case of a crash. As of MySQL 5.5, recovery times have been greatly improved and the whole log file flushing algorithm has been improved. In 5.5 you generally want larger log files as recovery is improved. Article about selecting innodb log file size is here

6 Innodb_flush_log_at_trx_commit -. The default value of 1 is the value required for ACID compliance. You can achieve better performance by setting the value different from 1, but then you can lose at most one second worth of transactions in a crash. With a value of 0, any mysqld process crash can erase the last second of transactions. With a value of 2, then only an operating system crash or a power outage can erase the last second of transactions. However, InnoDB's crash recovery is not affected and thus crash recovery does work regardless of the value.

7. Filesystem selection and disk issues - 'xfs' is known to perform very well with MySQL. Also writing the redo logs, binary logs, data files in different physical disks is a good practice with a bigger gain than server configuration. RAID 10 is recommended for best performance, more detail about disk issue can be found here and here


Wednesday, 30 May 2012

Data compression

Data compression not only significantly reduces the storage required for the database, but also improves throughput by reducing the I/O workload, at a modest cost in processing overhead. The storage cost savings can be important, but the reduction in I/O costs can be even more valuable. There are three possible ways to achieve that

1. Use compress/uncompress function in the database

2. Innodb-plugin offers table compression

3. Compress/uncompress data in client not in the database. This will offload more stuff – CPU and network. . This approach uses CPU cycles for compression and uncompression on the client machine rather than the database server, which might be appropriate for a distributed application with many clients, or where the client machine has spare CPU cycles.

We performed benchmarks (using table structre mentioned on my previous post: binary-log-synchronization) to evaluate the impact of using above mentioned data compression techniques (1 & 2 only), test data  was loaded using 8 parallel database sessions  into original table, compressed table and as well as into original table but using compress function to compress "content" column.

Time to
load 80k records
Data file size
0m57.515s 1.6GB
1m9.613s 648MB
using compress()
0m35.883s  156MB

The results show that inserting data in compressed format (i.e. using compress() function) is the fastest approach and it gives better compression than using innodb compressed table. Using compress() function we have the control to compress only selected columns whereas innodb compressed table compress the entire table.

We also perfomed benchmarks to read data from original table, compressed table and original table but using uncompress() function (to uncompress compressed data), total 800k lookups were perfomed using id=<>
Table Time to perform
800k lookups
using uncompress()
Original table 6m29.504s

The test results show that lookups performed againts compressed tables are at least 6 times faster than original table. The reason for which is that compressed table is much smaller than original table thus most of the data can fit into memory and hence reducing disk I/Os.

Wednesday, 18 April 2012

Binary log Synchronization

MySQL sync_binlog option variable is used to control how often binary log may be synchronized to disk. By default, it is not synchronized to disk at each write. So if the operating system or machine (not only the MySQL server) crashes, there is a chance that the last statements of the binary log are lost. To prevent this, you can make the binary log be synchronized to disk after every N writes to the binary log, with the sync_binlog system variable. With value of 1 - every single statement  is written to disk, it means that the performance suffers, usually a lot unless the disk has a battery-backed cache enabled, which makes synchronization fast.

I performed some load tests to see how it may hurt database performance when the battery-backed disk cache is missing:

Total 40,000 records inserted into table using 4 parallel db connections:

sync_binlog time
0 0m17.972s
1 1m4.599s
2 0m44.364s
3 0m34.197s
4 0m18.693s 

sync_binlog with value of 3 simply means: only fsync binlog every third transaction.
The higher you increase the value of sync_binlog, the closer the results will be to sync_binlog=0 (no fsyncing).

Value of 1 is safest of course, but as you see, without decent battery backed write caching controller, performance is very bad.

Test case :

Tools used:

-- MySQL 5.1.56
-- mysqlslap
-- platform -
Red Hat 4.1.2-50 x86_64
, RAID 5 (8 disks, 67.75 GB each, Vendor DELL)
, Intel(R) Xeon(R) CPU 5148  @ 2.33GHz, 4 CPUs
, 8G RAM

# Connect to MySQL client and perform:

drop database if exists test_case_9;
create database test_case_9;
use test_case_9;
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`content` blob,

drop function if exists insert_data;
delimiter //
CREATE DEFINER=`root`@`localhost` FUNCTION `insert_data`() RETURNS int(11)
INSERT INTO t(id,content) values (null,REPEAT('bla bla',5000));
delimiter ;

-- Change sync_binlog to value 0
mysql> set global sync_binlog=0;

mysql> quit

# Run following steps from a shell prompt

-- load data using mysqlslap tool
$ time mysqlslap --user=root  --query="select test_case_9.insert_data();"  --concurrency=4 --iterations=10000;
        Average number of seconds to run all queries: 0.001 seconds
        Minimum number of seconds to run all queries: 0.001 seconds
        Maximum number of seconds to run all queries: 0.229 seconds
        Number of clients running queries: 4
        Average number of queries per client: 1

real    0m17.972s
user    0m1.306s
sys     0m3.096s

A) -- Change sync_binlog to value 1 and empty test table 't'

$ mysql -uroot -S /mnt/database/mysql_5.1/mysql.sock test_case_9 -e"truncate table t; set global sync_binlog=1;"

B) -- load data using mysqlslap tool

$ time mysqlslap --user=root  --query="select test_case_9.insert_data();"  --concurrency=4 --iterations=10000;
        Average number of seconds to run all queries: 0.006 seconds
        Minimum number of seconds to run all queries: 0.003 seconds
        Maximum number of seconds to run all queries: 0.018 seconds
        Number of clients running queries: 4
        Average number of queries per client: 1

real    1m4.239s
user    0m1.672s
sys     0m4.203s

Repeat steps A and B to run load tests using Value 2, 3, and 4 with synch_binlog

Tuesday, 6 March 2012

MySQL - Event Scheduler

MySQL Events are tasks that run according to a schedule. Therefore, we sometimes refer to them as scheduled events. Basically an event is a named database object containing one or more SQL statements to be executed at one or more regular intervals, beginning and ending at a specific date and time. Conceptually, this is similar to the idea of the Unix crontab (also known as a “cron job”) or the Windows Task Scheduler.
Here is the summary of Event object features:
- An event is uniquely identified by its name and the schema to which it is assigned
- An event performs a specific action according to a schedule. This action consists of an SQL statement, which can be a compound statement in BEGIN .. END;
- An event's timing can be either one-time or recurrent. A one-time event executes one time only. A recurrent event repeats its action at a regular interval.
For further details, visit :
Let's create an Event object to increment 'counter' column of event_test table every minute:
mysql> CREATE EVENT my_event
DO UPDATE event_test set counter=counter+1;
Query OK, 0 rows affected (0.00 sec)
Metadata about events can be obtained as follows:
mysql> SHOW CREATE EVENT my_event\G
*************************** 1. row ***************************
               Event: my_event
           time_zone: SYSTEM
        Create Event: CREATE EVENT `my_event` 
ON SCHEDULE EVERY 1 MINUTE STARTS '2012-03-06 16:46:01' 
DO update event_test set counter=counter+1
character_set_client: latin1
collation_connection: latin1_swedish_ci
  Database Collation: latin1_swedish_ci
1 row in set (0.00 sec)

Since this is a repeating event and may not terminate within its scheduling interval, the result may be multiple instances of the event executing simultaneously. Which is undesirable,  use GET_LOCK() function or LOCK TABLE to avoid this.
Check if the 'my_event' has actually worked
mysql> SELECT SLEEP(60); SELECT * FROM event_test;
| SLEEP(60) |
|         0 |
1 row in set (1 min 0.01 sec)

| counter |
|       0 |
1 row in set (0.00 sec)

No it has not worked, because  event scheduler thread is not ON. The global event_scheduler system variable determines whether the Event Scheduler is enabled and running on the server.
mysql> select @@GLOBAL.event_scheduler ;
| @@GLOBAL.event_scheduler |
| OFF                      |
1 row in set (0.00 sec)
To enable Event Scheduler use this sql:
mysql> set GLOBAL event_scheduler =ON;
Query OK, 0 rows affected (0.00 sec)
For more information about event scheduler, see
When the Event Scheduler is ON, the event scheduler thread is listed in the output of SHOW PROCESSLIST as a daemon process, and its state is represented as shown here:

| 43 | event_scheduler | localhost | NULL | Daemon  |   35 |
 Waiting for next activation | NULL             |
Lets check the status of 'counter' after truning event Scheduler ON
mysql> SELECT SLEEP(60); SELECT * FROM event_test;
| SLEEP(60) |
|         0 |
1 row in set (1 min 0.00 sec)

| counter |
|       1 |
1 row in set (0.00 sec)

mysql> SELECT SLEEP(60); SELECT * FROM event_test;
| SLEEP(60) |
|         0 |
1 row in set (1 min 0.01 sec)

| counter |
|       2 |
1 row in set (0.00 sec)
To disable my_event, use this ALTER EVENT statement:
ALTER EVENT my_event
It was just a simple example to show,
that how to automate database task inside MySQL server.

Wednesday, 25 January 2012

How to check if Innodb log files are big enough

InnoDB uses log files to store changes that must be applied to the database after service interruption (e.g power outage, crash). Thus, It is important for good performance that the Innodb log files are big enough.

In this example, I would demonstrate how to check the amount of InnoDB log file space in use, follow these steps ( checking current usage at peak times):
Examine INNODB MONITOR output (i.e. SHOW ENGINE INNODB STATUS\G) and look at these lines (if you are using 5.0 or 5.1 without Innodb plugin):
Log sequence number 2380 1505869110
Log flushed up to   2380 1505868967
Last checkpoint at  2380 936944426
1 pending log writes, 0 pending chkp writes
532415047 log i/o's done, 544.17 log i/o's/second

Perfrom following calulation (formula) to know log space used. The values to use in calculation are "Log sequence number" and "Last checkpoint at".

select (( ( 2380 *  4 * 1024 * 1024 * 1024) + 1505869110 ) - ( ( 2380 *  4 * 1024 * 1024 * 1024) + 936944426 )) /1024/1024 "Space used in MB";
| Space used in MB |
|     542.56885910 |
1 row in set (0.00 sec)

In this example 542 megabytes is more than 75% of the total log space, so the logsize (512MB) is small. Ensure that the amount of log space used never exceeds 75% of that value (542MB).  Find the instructions here to add/resizeInnodb log files.

There is slightly different method to calculate innodb log file space used (if you are using MySQL 5.5 or InnoDB plugin in MySQL 5.1): Examine INNODB MONITOR output (i.e. SHOW ENGINE INNODB STATUS\G) and look at these lines:
Log sequence number 2388016708
Log flushed up to   2388016690
Last checkpoint at  2380597012
Perfrom following calulation (formula) to know log space used. The values to use in calculation are "Log sequence number" and "Last checkpoint at".

SELECT (2388016708 - 2380597012)/1024/1024 "Space used in MB";
| Space used in MB |
|     7.0759735111 |
1 row in set (0.00 sec)

 In this example 7MB is not more than 75% of the total log space. As of MySQL 5.5, recovery times have been greatly improved and the whole log file flushing algorithm has been improved. In 5.5 you generally want larger log files as recovery is improved. Thus, the large innodb log files allow you to take advantage of the new algorithm.

Monday, 23 January 2012

MySQL Server Tuning

MySQL server tuning is important; if you mainly use Innodb tables then you need to check how well your Innodb buffer pool is sized.  InnoDB use it to cache data and indexes. The larger you set this value, the less disk I/O is needed to access data in tables. click here for more detail about innodb buffer pool

You can examine Innnodb buffer pool efficiency by looking at STATUS variables:

mysql> SHOW GLOBAL STATUS LIKE 'innodb_buffer_pool_rea%'; Innodb_buffer_pool_read_requests | 4519597979 | 
Innodb_buffer_pool_reads         | 55253      | 

 Innodb_buffer_pool_read_requests are number of request to read a row from the buffer pool and Innodb_buffer_pool_reads is the number of times Innodb has to perform read data from disk to fetch required data pages.

So innodb_buffer_pool_reads/innodb_buffer_pool_read_requests*100= 0.001 is the efficiency.  Thus, we see the vast majority of times INNODB is able to statisfy requests from memory, it's pretty normal for databases to have hot spots in which you're accessing only a portion of the data the majority of the time.

Let's look at this example,

mysql> SHOW GLOBAL STATUS LIKE 'innodb_buffer_pool_rea%'; Innodb_buffer_pool_read_requests  | 2905072850 | 
Innodb_buffer_pool_reads          | 1073291394 |

Calculate Innodb buffer pool efficiency:
(107329139/ 2905072850*100) = 37

Here the Innodb is doing more disk reads, Innodb buffer pool is not big enough!

Another way to examine innodb efficiency is to examine SHOW ENGINE INNODB STATUS output:

Total memory allocated 1838823262; in additional pool allocated 10550784
Buffer pool size   102400
Free buffers 0 

Database pages     101424
Modified db pages  24189
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages read 2050214912, created 313658031, written 755828632
36.95 reads/s, 12.12 creates/s, 7.05 writes/s

Buffer pool hit rate 993 / 1000

  • If there are numerous Innodb free buffers, you may need to reduce Innodb buffer pool. 
  • Innodb hit ratio: 1000/1000 identifies a 100% hit rate 
  • Slightly lower hit rate values may acceptable.
  • If you find Innodb hit ratio is less than 95% then you may need to increase Innodb buffer pool size.

    • Note: On a dedicated database server, you may set innodb_buffer_pool_size up to 80% of the machine physical memory size. However, do not set it too large because competition for physical memory might cause paging in the operating system.

      Database size is usually much bigger than available RAM, so most of the time it's no feasible to have buffer pool size equals to database size. Luckily, innodb creates hot spots inside buffer pool, that is, most frequently accessed data pages stay in memory but if your application performs a lot of random disk I/Os and your database is too large to fit in memory and only small percentage of data pages can be cached in innodb buffer pool, then adding more RAM is the best solution for random-read I/O problem.

      If you still get poor Buffer pool hit rate, enable slow query log to capture bad queries that perform table scans and hence blow out buffer pool cache.