Thursday, 25 November 2010

Query Tuning (Query optimization) Part-1

In this artical I'll try to explain query tuning techniques, to start with we'll need to capture slow queries, that is, log queries executing longer than long_query_time server variable (in seconds but supports microseconds when logging to file). The slow query log help identify candidates for query optimization.

Assumptions:

1) The reader has basic MySQL/Unix skills.


1) Enable slow query log:
Add following lines into MySQL option file (i.e. /etc/my.cnf) under [mysqld] section

log-slow-queries=mysql_slow.log
long-query-time=2

Note: If you specify no name for the slow query log file, the default name is host_name-slow.log.

And restart MySQL server when it's safe to do so e.g. /etc/init.d/mysql restart

For further details on slow query log, please visit here:
# if using 5.1
http://dev.mysql.com/doc/refman/5.1/en/slow-query-log.html
# if using 5.0
http://dev.mysql.com/doc/refman/5.0/en/slow-query-log.html

If you are using MySQL 5.1, you can enable slow log this way:

mysql> set global long_query_time = 2;
mysql> set global slow_query_log = 1;
mysql> set global slow_query_log_file = 'mysql_slow.log‘;

2) Process Slow Query Log:
You can process the whole slow log file and find most frequent slow queries using mysqldumpslow utility

# Find top 10 slowest queries

$ mysqldumpslow -s t -n 10 /path/to/mysql_slow.log >mysql_slow.log.c
logReading mysql slow query log from /path/to/mysql_slow.log

Count: 1 Time=1148.99s (1148s) Lock=0.00s (0s) Rows=0.0 (0),
insert into a select * from b
Count: 37 Time=2.28s (84s) Lock=0.11s (4s) Rows=0.0 (0),
Update a set CONTENT_BINARY = 'S' where ID = 3874
Count: 1 Time=29.31s (29s) Lock=0.00s (0s) Rows=0.0 (0), select max(LOCK_VERSION) from b
...

For further details on using mysqldumpslow please visit here:
http://mysqlopt.blogspot.com/search?q=mysqldumpslow
http://dev.mysql.com/doc/refman/5.1/en/mysqldumpslow.html

3) Using EXPLAIN to Analyze slow queries:
this will provide us find
a) Whether optimizer is using existing idnexes
b) Can help qualify query rewrites
c) Points out need to index

4) Main query performance issues:
a) Full table scans
b) Temporary tables
c) Filesort

5) Let's start with basics, that is, 'Full table scans' situations

mysql> EXPLAIN select * from tblmeshort where timestamp between '2010-08-17 00:00:00' and '2010-08-17 01:20:00'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: tblmeshort
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 400336
Extra: Using where
1 row in set (0.00 sec)

The EXPLAIN ouput suggests that optimizer will do 'FULL TABLE SCAN', as indicated by type: ALL.

The solution is to add an index on `timestamp` column, as we are filtering rows using this column.

mysql> alter table tblmeshort add index (`timestamp`);
Query OK, 0 rows affected (1.84 sec)
Records: 0 Duplicates: 0 Warnings: 0

Let's re-run EXPLAIN on the same query

mysql> EXPLAIN select * from tblmeshort where timestamp between '2010-08-17 00:00:00' and '2010-08-17 01:20:00'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: tblmetricshort
type: range
possible_keys: timestamp
key: timestamp
key_len: 4
ref: NULL
rows: 1
Extra: Using where
1 row in set (0.00 sec)

Now let's remove this index and look at another example:

mysql> alter table tblmehort drop index `timestamp`;
Query OK, 0 rows affected (0.03 sec)
Records: 0 Duplicates: 0 Warnings: 0

mysql> EXPLAIN SELECT * FROM tblmeshort
INNER JOIN tblMetric
ON (tblmehort.fkMetric_ID=tblMetric.pkMetric_ID)
WHERE timestamp between '2010-08-17 00:00:00' and '2010-08-17 01:20:00'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: tblmeshort
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 400336
Extra: Using where
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: tblMetric
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 3
ref: metrics.tblmehort.fkMetric_ID
rows: 1
Extra: Using where
2 rows in set (0.00 sec)

In the above example, the optimizer will perform full table scan on tblmeshort first and when MySQL goes looking for rows in tblMetric, instead of table scanning like it did before, it will use the value of fkMetric_ID with the 'PRIMARY KEY' of tblMetric table to directly fetch matching rows from tblMetric. Thus this SQL is partially optimized, that is, it does scan all rows of tblmeshort but it uses index to join tables.

The solution is to add an index on `timestamp` column and re-run EXPLAIN with same query

mysql> EXPLAIN SELECT * FROM tblmetricshort INNER JOIN tblMetric ON (tblmetricshort.fkMetric_ID=tblMetric.pkMetric_ID) WHERE timestamp between '2010-08-17 00:00:00' and '2010-08-17 01:20:00'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: tblmetricshort
type: range
possible_keys: timestamp
key: timestamp
key_len: 4
ref: NULL
rows: 1
Extra: Using where
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: tblMetric
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 3
ref: metrics.tblmetricshort.fkMetric_ID
rows: 1
Extra: Using where
2 rows in set (0.00 sec)


Continue....


Tuesday, 9 November 2010

Short index length (good or bad)

If you find the work load on DB server is disk bound and you do not have enough memory to increase innodb buffer pool size. We can help improve the performance by reducing the length of indexes, this will result more indexes fit into memory and would increase write operations. It may impact SELECT queries but not necessarly for bad. Allow me to show some example. I will lower the index length of my test table from 40 char to just 8 char:
Note: You may consider trying different lengths for your index and check which best fit your workload (both writing and reading speed).

ALTER TABLE csc52021 DROP INDEX value_40, ADD INDEX value_8(value(8));
To look all the rows that have a value between 'aaa' and 'b':
mysql> EXPLAIN SELECT COUNT(*) FROM csc52021 WHERE value>'aaa%' AND
mysql> <'b';
+----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table    | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+
| 1  | SIMPLE      | csc52021 | range | value_8   | value_8     | 11      | NULL | 9996 | Using where |
+----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.00 sec)
mysql> SELECT COUNT(*) FROM csc52021 WHERE value&gt;'aaa%' AND value&lt;'b';
+----------+
| COUNT(*) |
+----------+
| 5533     |
+----------+
1 row in set (0.02 sec)
As you can see, the index is used.
What if look for a specific value?
mysql> EXPLAIN SELECT COUNT(*) FROM csc52021 WHERE
mysql> value='a816d6ce93c2aa992829f7d0d9357db896d5e7de';
+----+-------------+----------+------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table    | type | possible_keys | key     | key_len | ref   | rows | Extra       |
+----+-------------+----------+------+---------------+---------+---------+-------+------+-------------+
| 1  | SIMPLE      | csc52021 | ref  | value_8       | value_8 | 11      | const | 1    | Using where |
+----+-------------+----------+------+---------------+---------+---------+-------+------+-------------+
1 row in set (0.00 sec)

mysql> SELECT COUNT(*) FROM csc52021 WHERE
mysql> value='a816d6ce93c2aa992829f7d0d9357db896d5e7de';
+----------+
| COUNT(*) |
+----------+
| 1        |
+----------+
1 row in set (0.00 sec)

SELECT statements does a lookup for a value that have very low cardinality (for example, if we had millions of rows that starts with the same first 16 char "a816d6ce93c2aa99"), then a short index is not very efficient.
On the other hand, a short index length allows to fit more indexes value in memory, increasing the lookup speed.

In my test table, I can decrease the index length to 4 and still have good performance. For example:

mysql> ALTER TABLE csc52021 DROP INDEX value_8, ADD INDEX
mysql> value_4(value(4));

mysql> EXPLAIN SELECT COUNT(*) FROM csc52021 WHERE
mysql> value='a816d6ce93c2aa992829f7d0d9357db896d5e7de';
+----+-------------+----------+------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table    | type | possible_keys | key     | key_len | ref   | rows | Extra |
+----+-------------+----------+------+---------------+---------+---------+-------+------+-------------+
| 1  | SIMPLE      | csc52021 | ref  | value_4       | value_4 | 7       | const | 2    | Using where |
+----+-------------+----------+------+---------------+---------+---------+-------+------+-------------+
1 row in set (0.00 sec)

mysql&gt; EXPLAIN SELECT COUNT(*) FROM csc52021 WHERE value LIKE 'a816%';
+----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table    | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+
| 1  | SIMPLE      | csc52021 | range | value_4       | value_4 | 7       | NULL | 2    | Using where |
+----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.00 sec)

mysql&gt; SELECT COUNT(*) FROM csc52021 WHERE value LIKE 'a816%';
+----------+
| COUNT(*) |
+----------+
| 2        |
+----------+
1 row in set (0.00 sec)

To summarize:
if you have good cardinality, a shorter index length will increase both writes and reads operations, especially if you are IO bound.

Thursday, 29 July 2010

Backup mysql binary logs



It is always good to make a backup of all the log files you are about to delete. Alternatively if you take incremental backups then you should rotate the binary log by using FLUSH LOGS. This done, you need to copy to the backup location all binary logs which range from the one of the moment of the last full or incremental backup to the last but one. These binary logs are the incremental backup

Here is the bash script, which you can use to backup binary logs, all you need to do is change following param according to your needs and all yours. This script is not mine, I got the idea from here:


#
# This script backup binary log files
#

backup_user=dba
backup_password=xxxx
backup_port=3306
backup_host=localhost 
log_file=/var/log/binlog_backup.log
binlog_dir=/mnt/database/logs # Path to binlog
backup_dir=/mnt/archive/binlogs/tench # Path to Backup directory

PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin
export PATH

Log()
{
echo "`date` : $*" >> $log_file
}

mysql_options()
{
common_opts="--user=$backup_user --password=$backup_password"
if [ "$backup_host" != "localhost" ]; then
common_opts="$common_opts --host=$backup_host --port=$backup_port"
fi
}

mysql_command()
{
mysql $common_opts --batch --skip-column-names  -e "$2"
}

mysql_options

Log "[INIT] Starting MySQL binlog backup"

Log "Flushing MySQL binary logs (FLUSH LOGS)"

mysql_command mysql "flush logs"

master_binlog=`mysql_command mysql "show master status" 2>/dev/null | cut -f1`

Log "Current binary log is: $master_binlog"

copy_status=0

for b in `mysql_command mysql "show master logs" | cut -f1`
do
if [ -z $first_log ]; then
first_log=$b
fi
if [ $b != $master_binlog ]; then
Log "Copying binary log ${b} to ${backup_dir}"
rsync -av $backup_host:/$binlog_dir/$b $backup_dir >& /dev/null
if [ $? -ne 0 ]; then
copy_status=1
break
fi
else
break
fi
done

if [ $copy_status -eq 1 ]; then
Log "[ERR] Failed to copy binary logs cleanly...aborting"
exit 1
fi

Wednesday, 28 July 2010

NOW() function is not replication-safe

It says on mysql doc that NOW () function is replication-safe and they have given an example too to prove this fact that you will obtain the same result on slave as on the master.
http://dev.mysql.com/doc/refman/5.1/en/replication-features-functions.html

I'll try to prove it with an example that NOW()function is not replication-safe. Suppose Master is located in 'New york' and Slave is in 'London' and both servers are using local time. On Master you create table and insert a row as well as you explicitly set session time zone to 'SYSTEM'.

Note: Both Master/Slave are using identical version of MySQL, i.e. 5.1.47

mysql> CREATE TABLE test_now_func (mycol DATETIME);
Query OK, 0 rows affected (0.06 sec)

mysql> INSERT INTO test_now_func VALUES ( NOW() );
Query OK, 1 row affected (0.00 sec)

mysql> SET TIME_ZONE=’SYSTEM’;

mysql> SELECT * FROM test_now_func;
+---------------------+
| mycol |
+---------------------+
| 2009-09-01 12:00:00 |
+---------------------+
1 row in set (0.00 sec)

However if you do a SELECT on slaves copy you will see different result:
mysql> SELECT * FROM test_now_func;
+---------------------+
| mycol |
+---------------------+
| 2009-09-01 17:00:00 |
+---------------------+
1 row in set (0.00 sec)

The correct solution is recorded some where else, that is, the same system time zone should be set for both master and slave.
http://dev.mysql.com/doc/refman/5.1/en/replication-features-timezone.html

For example

[mysqld]
..
timezone=’America/New_York’

Friday, 23 July 2010

Key cache hit ratio

When we talk about key cache performance we usually look at these two status variables.

1. Key_read_requests
The number of requests to read a key block from the cache.

2. Key_reads
The number of physical reads of a key block from disk.

There is a good reason to examine Key_reads, because we know that disks are very slow relative to RAM but the Key_reads aren't always physical disk reads at all. If the requested block of data isn't in the operating system's cache, then a Key_read is a disk read, you will be lucky if it is cached, then it's just a system call. Having said that it is always good to minimize key_reads which may cause randon disk I/O.

The optimum solution is to keep the ratio Key_reads : Key_read_requests should be 1:100 and Key_writes / Key_write_requests should always be less than 1.

Finally I'll like to show you something partially useful you can do with Key_reads:

[xxxxxx ~]$ mysqladmin -uroot -p ext -ri10 | grep Key_reads
Enter password:
| Key_reads | 44605148 |
| Key_reads | 4 |
| Key_reads | 4 |
| Key_reads | 13 |
| Key_reads | 9 |
| Key_reads | 6 |
| Key_reads | 20 |
| Key_reads | 6 |
| Key_reads | 11 |

This server is doing approximately 7 Key_reads every ten seconds but it is hard to say that how many of them are random I/O and how many are just system calls (i.e. read from operrating system's cache)

Wednesday, 14 July 2010

upgrading to innodb plugin

As of MySQL 5.1.38, the InnoDB Plugin is included in MySQL 5.1 releases, in addition to the built-in version of InnoDB that has been included in previous releases. So all we need to do is just enable it, that is, add following options into my.cnf file

[mysqld]
..
ignore-builtin-innodb
plugin-load=innodb=ha_innodb_plugin.so

for further readings visit here http://dev.mysql.com/doc/refman/5.1/en/news-5-1-38.html

Assuming you have succesfully enabled Innodb Plugin, now what? would you automatically get benefits from all new Innodb plugin features?
The answer is yes/no, because Innodb Plugin supports new file format 'Barracuda', which introduces two such new data structures: compressed
, and long variable-length columns stored off-page. By default, the InnoDB Plugin does not create tables in a format that is incompatible with the built-in InnoDB in MySQL.
This means you won't get these two features automatically until you create new tables using new file format or convert exisiting tables to new format.

Innodb Plugin offers lots of other features too and these features are enabled by default, also you will automatically get benefit from these features such as fast index creation and lots of performance and scalability enhancments,
for full details click here
here http://www.innodb.com/doc/innodb_plugin-1.0/innodb-performance.html

Friday, 2 July 2010

Repair by keycache vs Repair by sorting

It is advised to disable keys on MYISAM/ARCHIVE tables before performing bulk insert operations should give a considerable speedup, especially when you have many indexes. And then enable keys to to re-create missing indexes, this task might take much longer than bulk insert operation, sometimes it may take days to finish.

One approach would be not to disable keys at first place but would this help at all? the answer is it may or may not. Personally I do not suggest this approach. If you find enable key task is taking longer then start new session with MySQL and check the state of enable keys command. If A SHOW PROCESSLIST reveal, not surprisingly, the dreaded "Repair with keycache", which seems to multiply indexing time by a factor of 20 to 30. If you want to see "repair by sort" to be used then theoretical max size of every single index must fit into myisam_max_sort_file_size.

Solution: increase myisam_max_sort_file_size

mysql> SET GLOBAL myisam_max_sort_file_size=100000 * 1024 * 1024

And restart enable keys

Similarly if your repair takes several hours then either use above solution or
you might think of using the utility 'myisamchk' with the -n (or --sort-recover) option.