Thursday 25 November 2010

Query Tuning (Query optimization) Part-1

In this artical I'll try to explain query tuning techniques, to start with we'll need to capture slow queries, that is, log queries executing longer than long_query_time server variable (in seconds but supports microseconds when logging to file). The slow query log help identify candidates for query optimization.

Assumptions:

1) The reader has basic MySQL/Unix skills.


1) Enable slow query log:
Add following lines into MySQL option file (i.e. /etc/my.cnf) under [mysqld] section

log-slow-queries=mysql_slow.log
long-query-time=2

Note: If you specify no name for the slow query log file, the default name is host_name-slow.log.

And restart MySQL server when it's safe to do so e.g. /etc/init.d/mysql restart

For further details on slow query log, please visit here:
# if using 5.1
http://dev.mysql.com/doc/refman/5.1/en/slow-query-log.html
# if using 5.0
http://dev.mysql.com/doc/refman/5.0/en/slow-query-log.html

If you are using MySQL 5.1, you can enable slow log this way:

mysql> set global long_query_time = 2;
mysql> set global slow_query_log = 1;
mysql> set global slow_query_log_file = 'mysql_slow.log‘;

2) Process Slow Query Log:
You can process the whole slow log file and find most frequent slow queries using mysqldumpslow utility

# Find top 10 slowest queries

$ mysqldumpslow -s t -n 10 /path/to/mysql_slow.log >mysql_slow.log.c
logReading mysql slow query log from /path/to/mysql_slow.log

Count: 1 Time=1148.99s (1148s) Lock=0.00s (0s) Rows=0.0 (0),
insert into a select * from b
Count: 37 Time=2.28s (84s) Lock=0.11s (4s) Rows=0.0 (0),
Update a set CONTENT_BINARY = 'S' where ID = 3874
Count: 1 Time=29.31s (29s) Lock=0.00s (0s) Rows=0.0 (0), select max(LOCK_VERSION) from b
...

For further details on using mysqldumpslow please visit here:
http://mysqlopt.blogspot.com/search?q=mysqldumpslow
http://dev.mysql.com/doc/refman/5.1/en/mysqldumpslow.html

3) Using EXPLAIN to Analyze slow queries:
this will provide us find
a) Whether optimizer is using existing idnexes
b) Can help qualify query rewrites
c) Points out need to index

4) Main query performance issues:
a) Full table scans
b) Temporary tables
c) Filesort

5) Let's start with basics, that is, 'Full table scans' situations

mysql> EXPLAIN select * from tblmeshort where timestamp between '2010-08-17 00:00:00' and '2010-08-17 01:20:00'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: tblmeshort
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 400336
Extra: Using where
1 row in set (0.00 sec)

The EXPLAIN ouput suggests that optimizer will do 'FULL TABLE SCAN', as indicated by type: ALL.

The solution is to add an index on `timestamp` column, as we are filtering rows using this column.

mysql> alter table tblmeshort add index (`timestamp`);
Query OK, 0 rows affected (1.84 sec)
Records: 0 Duplicates: 0 Warnings: 0

Let's re-run EXPLAIN on the same query

mysql> EXPLAIN select * from tblmeshort where timestamp between '2010-08-17 00:00:00' and '2010-08-17 01:20:00'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: tblmetricshort
type: range
possible_keys: timestamp
key: timestamp
key_len: 4
ref: NULL
rows: 1
Extra: Using where
1 row in set (0.00 sec)

Now let's remove this index and look at another example:

mysql> alter table tblmehort drop index `timestamp`;
Query OK, 0 rows affected (0.03 sec)
Records: 0 Duplicates: 0 Warnings: 0

mysql> EXPLAIN SELECT * FROM tblmeshort
INNER JOIN tblMetric
ON (tblmehort.fkMetric_ID=tblMetric.pkMetric_ID)
WHERE timestamp between '2010-08-17 00:00:00' and '2010-08-17 01:20:00'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: tblmeshort
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 400336
Extra: Using where
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: tblMetric
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 3
ref: metrics.tblmehort.fkMetric_ID
rows: 1
Extra: Using where
2 rows in set (0.00 sec)

In the above example, the optimizer will perform full table scan on tblmeshort first and when MySQL goes looking for rows in tblMetric, instead of table scanning like it did before, it will use the value of fkMetric_ID with the 'PRIMARY KEY' of tblMetric table to directly fetch matching rows from tblMetric. Thus this SQL is partially optimized, that is, it does scan all rows of tblmeshort but it uses index to join tables.

The solution is to add an index on `timestamp` column and re-run EXPLAIN with same query

mysql> EXPLAIN SELECT * FROM tblmetricshort INNER JOIN tblMetric ON (tblmetricshort.fkMetric_ID=tblMetric.pkMetric_ID) WHERE timestamp between '2010-08-17 00:00:00' and '2010-08-17 01:20:00'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: tblmetricshort
type: range
possible_keys: timestamp
key: timestamp
key_len: 4
ref: NULL
rows: 1
Extra: Using where
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: tblMetric
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 3
ref: metrics.tblmetricshort.fkMetric_ID
rows: 1
Extra: Using where
2 rows in set (0.00 sec)


Continue....


No comments:

Post a Comment