1. Use compress/uncompress function in the database
http://dev.mysql.com/doc/refman/5.1/en/encryption-functions.html#function_compress
2. Innodb-plugin offers table compression
http://dev.mysql.com/doc/innodb-plugin/1.0/en/innodb-compression-background.html
3. Compress/uncompress data in client not in the database. This will offload more stuff – CPU and network. . This approach uses CPU cycles for compression and uncompression on the client machine rather than the database server, which might be appropriate for a distributed application with many clients, or where the client machine has spare CPU cycles.
We performed benchmarks (using table structre mentioned on my previous post: binary-log-synchronization) to evaluate the impact of using above mentioned data compression techniques (1 & 2 only), test data was loaded using 8 parallel database sessions into original table, compressed table and as well as into original table but using compress function to compress "content" column.
Table Format |
Time to load 80k records |
Data file size |
---|---|---|
Original Table |
0m57.515s | 1.6GB |
Compressed Table: ROW_FORMAT =COMPRESSED |
1m9.613s | 648MB |
Original Table using compress() function |
0m35.883s | 156MB |
The results show that inserting data in compressed format (i.e. using compress() function) is the fastest approach and it gives better compression than using innodb compressed table. Using compress() function we have the control to compress only selected columns whereas innodb compressed table compress the entire table.
We also perfomed benchmarks to read data from original table, compressed table and original table but using uncompress() function (to uncompress compressed data), total 800k lookups were perfomed using id=<
Table | Time to perform 800k lookups |
---|---|
Compressed Table: row_format= compressed |
1m26.426s |
Original Table using uncompress() function |
1m26.197s |
Original table | 6m29.504s |
The test results show that lookups performed againts compressed tables are at least 6 times faster than original table. The reason for which is that compressed table is much smaller than original table thus most of the data can fit into memory and hence reducing disk I/Os.
If your clients are on separate server(s) than the database, do the compression/decompression in the client. It is easier to scale clients than the database Master.
ReplyDeleteIf you are compressing what was a TEXT field, use a BLOB field for the compressed equivalent. (Ditto for MEDIUMTEXT/MEDIUMBLOB, etc.)
Rule of Thumb: English text, JSON, XMS, code, etc, compress 3:1. JPEG, PDF, etc, are already compressed; you should not bother compressing again.
If you are mixing compressible and incompressible stuff, or long and short values, do not bother to have a "is_compressed" flag; simply compress everything.
Alternatively, you could mix compressed and uncompressed values in the same BLOB -- check "HEX(LEFT(col, 2)) == '789C'". (At least, that is the magic for some client compression functions.)