Database Administration: Data compression

Data compression not only significantly reduces the storage required for the database, but also improves throughput by reducing the I/O workload, at a modest cost in processing overhead. The storage cost savings can be important, but the reduction in I/O costs can be even more valuable. There are three possible ways to achieve that

1. Use compress/uncompress function in the database
http://dev.mysql.com/doc/refman/5.1/en/encryption-functions.html#function_compress

2. Innodb-plugin offers table compression
http://dev.mysql.com/doc/innodb-plugin/1.0/en/innodb-compression-background.html

3. Compress/uncompress data in client not in the database. This will offload more stuff – CPU and network. . This approach uses CPU cycles for compression and uncompression on the client machine rather than the database server, which might be appropriate for a distributed application with many clients, or where the client machine has spare CPU cycles.

We performed benchmarks (using table structre mentioned on my previous post: binary-log-synchronization) to evaluate the impact of using above mentioned data compression techniques (1 & 2 only), test data was loaded using 8 parallel database sessions into original table, compressed table and as well as into original table but using compress function to compress "content" column.

Table Format	Time to load 80k records	Data file size
Original Table	0m57.515s	1.6GB
Compressed Table: ROW_FORMAT =COMPRESSED	1m9.613s	648MB
Original Table using compress() function	0m35.883s	156MB

The results show that inserting data in compressed format (i.e. using compress() function) is the fastest approach and it gives better compression than using innodb compressed table. Using compress() function we have the control to compress only selected columns whereas innodb compressed table compress the entire table.

We also perfomed benchmarks to read data from original table, compressed table and original table but using uncompress() function (to uncompress compressed data), total 800k lookups were perfomed using id=<>

Table	Time to perform 800k lookups
Compressed Table: row_format= compressed	1m26.426s
Original Table using uncompress() function	1m26.197s
Original table	6m29.504s

The test results show that lookups performed againts compressed tables are at least 6 times faster than original table. The reason for which is that compressed table is much smaller than original table thus most of the data can fit into memory and hence reducing disk I/Os.

1 comment:

Anonymous3 July 2012 at 12:58
If your clients are on separate server(s) than the database, do the compression/decompression in the client. It is easier to scale clients than the database Master.

If you are compressing what was a TEXT field, use a BLOB field for the compressed equivalent. (Ditto for MEDIUMTEXT/MEDIUMBLOB, etc.)

Rule of Thumb: English text, JSON, XMS, code, etc, compress 3:1. JPEG, PDF, etc, are already compressed; you should not bother compressing again.

If you are mixing compressible and incompressible stuff, or long and short values, do not bother to have a "is_compressed" flag; simply compress everything.

Alternatively, you could mix compressed and uncompressed values in the same BLOB -- check "HEX(LEFT(col, 2)) == '789C'". (At least, that is the magic for some client compression functions.)

Database Administration

Wednesday, 30 May 2012

Data compression

1 comment:

About Me

Blog Archive