Tuesday 7 April 2009

Centos 5.3 vs Sun Solaris 10- innodb performance

This article compares the performance characteristics of mysql 5.133 operating on Solaris 10 verses Centos 5.3.
The Test result shows that innodb performance on Centos 5.3 is much better than on Sun Solaris 10.

Lab Setup-

The list of hardware used is as follows:

vendor_id : AuthenticAMD

cpu family : 15

model : 107

model name : AMD Athlon(tm) 64 X2 Dual Core Processor 4000+

stepping : 1

cpu MHz : 2100.000

cache size : 512 KB

RAM : 1.5G


The list of software used is as follows:

* CentOS 5.3
* Solaris 5.10
* Mysql 5.1.33
* vmstat

same my.cnf used on both Centos and Solaris.
######################################
[mysqld]
# InnoDB tables
innodb_file_per_table
innodb_buffer_pool_size = 512M
innodb_additional_mem_pool_size = 20M

innodb_log_file_size =128M
######################################

The Server was first loaded with the CentOS operating system and mysql 5.1.33 was installed ( i.e. RPM installtion). Total 23 million rows were inserted into innodb tables using mysqlimport command.

vmstat stats-
The vmstat shows that OS was writing at more than 14MB/s, but no I/O wait is reported,the CPU was at 49% running non-kernel code.

test results on Sun Soloaris 10 continue..

Wednesday 1 April 2009

FEDERATED Storage Engine

I have observed that DML run faster on FEDERATED tables compared to SELECT statements.
FEDERATED isn't very clever when it comes to SELECT, it does not, for example, honor LIMIT clauses but instead of doing it on the remote server, does the LIMIT processing locally. There are other limitations to the protocol that FEDERATED uses that makes it sometimes underperform for SELECT queries.
This is not the case with DML operations as they are handled 100% on the remote server
Strangely not all type of select queries take long except where I use count() function.
Actually FEDERATED move all data to the client and then perform the count() in this case, and it would explain the lack of speed.

Mysqldumpslow -How to

Note: In case if you haven't enabled slow log and are using 5.0.x then add following line into my.cnf under [mysqld] and restart mysql server.
####### 5.0.x
log-slow-queries
long_query_time=5 # capture slowest slow queries first
######

# if you have installed 5.1.x then follow following steps
mysql> set global slow_query_log=1;
Query OK, 0 rows affected (0.02 sec)
mysql> set global long_query_time=5;
Query OK, 0 rows affected (0.00 sec)

You need to have perl installed on your system in order to run mysqldumpslow. If you have perl installed but in different location from /usr/local/bin/perl, you can either edit the mysqldumpslow
script to point to the right location of the perl binary or you can create symbolic link /usr/local /bin/perl that points to the real location of the perl binary
e.g.
mysqldumpslow -s c -t 10 /path-to-slowlog/slowlog.log

If the log file is big, mysqldumpslow might put some load on your cpu so I would suggest that you copy the log file on some other box and run mysqldumpslow there, just to avoid cpu load on your server.

In order to parse only 200 last lines of the file, you will have to do something like this:

tail --lines=200 /data/data/eus3300005-slow.log | mysqldumpslow -t20 -

(tail --lines=200 will get only last 200 lines and the last "-" parameter
for mysqldumpslow tell's it to read the log from stdin)

For solaris 10, please use:

tail -200 /data/data/eus3300005-slow.log | mysqldumpslow -t20 -

Transaction isolation levels

The four levels, in ascending order of strictness, are:

* READ UNCOMMITTED:
* READ COMMITTED:
* REPEATABLE READ:
* SERIALIZABLE:

There is a big difference between READ COMMITTED and REPEATABLE READ. Basically READ COMMITED means that within one transaction, you will be able to see *commited* changes by another transaction. So within the same transaction you can get different values for the same query. For example, we have transaction T1 which reads a row:

BEGIN
SELECT row FROM t WHERE id=1; ---> this returns row="my row"

Now transaction T2 does an update and commits it:

BEGIN
UPDATE t SET row="updated row" WHERE id=1;
COMMIT

Back to T1, if we issue the query again we would get:

SELECT row FROM t WHERE id=1; ---> this returns row="updated row"

So we have different values of "row" within the same transaction.


The case is different in a repeatable read. As the name suggests, every read you do will be "repeated". So in the T1 transaction above, the SELECT statement will always return row="my row" even if other transactions (T2) change the value of row.
With REPEATABLE READ, your T1 transaction is guaranteed to read the same value every time it does an identical SELECT (even if another T2 transaction changes those SELECTED values in the meantime).
This will not happen in READ COMMITTED. Within the T1 transaction it is possible that the same SELECT will give different results (because T2 changed them in the meantime), ie, the isolation level is lower.

I'll discuss the remaining two in my next post...

Concurrency, transactions and isolation levels

Concurrency, transactions and isolation levels are a complicated matter so I understand it's initially difficult to grasp the concepts.

Choosing an isolation level is basically a trade off between the amount of concurency and obtaining "the correct and expected" data. Higher isolation levels (like SERIALIZABLE) lower the amount of concurrency because one transaction will depend on the outcome of the other and therefore can not run at the same time (more locking is needed). However, this insures that results are "more correct" as other transactions can't modify it.

It's difficult to explain these concepts so let me try it with an example. Say you are at an ATM machine and want to withdraw some money. You ask the machine how much money you have and it says $500. So you say, ok, continue and withdrawn $500 dollars. The ATM will do:

withdraw($500) if balance >= $500

Just before you ask the ATM to withdrawn (but after you've seen your balance), your telephone company tries to withdraw $100 to pay for your phone bill.

With READ COMMITTED, the ATM will read your balance, which is now $500-$100=$400 and refuse to give you the $500 (it read the committed value and now knows you only have $400), even though it just told you you had $500.

With REPEATABLE READ, the ATM will read your balance, but it is guarantee to read the same value, i.e., $500 and will give you the $500 dollars. You may ask, what happens to the $100 withdraw from the phone company? It depends on implementation, but usually the phone bill transaction will try to do the update, find a lock and wait until the ATM has finished. That is why higher isolation levels have lower concurrency but more "accurate" data, as them make transactions more serializable.