Tuesday, 14 May 2013

MySQL slow query example of Index Merge Intersection

My client reported a sudden slow query performance issue, the same query has been running fine for a long time and used to complete in under fraction of a second and now it's taking average 30 seconds to complete.  However, the same query runs fast on development server but slow in production.
MySQL processlist shows that  there are many threads running the same slow query, and the back log continue to grow...

Looking at SQL, it is not complex:

SELECT COUNT(*) 
FROM tblA
WHERE tblA.fkA_ID = 38926722
  AND tblA.fkAType_ID = 1
  AND tblA.fkADetail_ID = 476
  AND tblA.`Date` = '2013-05-10 07:14:41'
  AND tblA.Time = '2013-05-10 07:14:41'

And the EXPLAIN output is:
           id: 1
  select_type: SIMPLE
        table: tblA
         type: index_merge
possible_keys: FK_A_ID,FK_fkAType_ID,FK_fkADetail_ID,IX_Date
          key: FK_ApmAudit2ApmCase,FK_ApmAudit2ApmAuditType
      key_len: 4,1
          ref: NULL
         rows: 1
        Extra: Using intersect(FK_A_ID,FK_fkAType_ID); Using where
1 row in set (0.00 sec)

At first glance, it does not look too bad. MySQL is using two different indexes to search 1 out of 66M rows to return the count of total rows. It is worth mentioning here that column fkAType_ID is not very selective i.e. it holds value 1 or 2. Sadly, 90% of the rows had value 1. This means query is looking at millions of rows before doing index merge with column fkA_ID.

Ok, we know now what's wrong with the query, let's see what options do we have to to optimize this query:?

A) The table hasn't been purged for the past several months; this couldn't be done at this time
B) Pass index hints to the query optimizer to favor index on fkA_ID column.
SELECT COUNT(*) 
FROM tblA USE INDEX (FK_A_ID)
WHERE tblA.fkA_ID = 38926722
  AND tblA.fkAType_ID = 1
  AND tblA.fkADetail_ID = 476
  AND tblA.`Date` = '2013-05-10 07:14:41'
  AND tblA.Time = '2013-05-10 07:14:41'
Query OK, 1 row affected (0.20 sec)

Query completed in a fraction of a second, compared to 30 seconds! Lets look at EXPLAIN output:
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: tblA
         type: ref
possible_keys: FK_A_ID
          key: FK_A_ID
      key_len: 4
          ref: const
         rows: 179
        Extra: Using where
1 row in set (0.00 sec)

Once the backlog consumed, in the next available maintenance window, we purged historic data, removed index hints and added following index to the table:

ALTER TABLE tblA 
DROP INDEX `FK_A_ID`, 
ADD INDEX `FK_A_ID_DATE` (`fkA_ID`,`Date`)
Note: index on column FK_fkAType_ID is needed due to foreign key constraint. The table structure info:
CREATE TABLE `tblA` (
  `pkID` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `fkApplication_ID` smallint(5) unsigned NOT NULL,
  `fkA_ID` int(10) unsigned NOT NULL,
  `fkAType_ID` tinyint(3) unsigned NOT NULL,
  `Date` date NOT NULL,
...
  KEY `FK_Application` (`fkApplication_ID`),
  KEY `FK_fkAType_ID` (`fkAType_ID`),
  KEY `IX_Date` (`Date`),
  KEY `FK_A_ID` (`fkA_ID`),
...
CONSTRAINT `FK_AType` FOREIGN KEY (`fkAType_ID`) REFERENCES `tblAType` (`pkAType_ID`),
...
) ENGINE=InnoDB AUTO_INCREMENT=505970021 DEFAULT CHARSET=utf8

Monday, 15 April 2013

MongoDB - Identify & optimize inefficient queries


MongoDB contains a database profiling system that can help identify inefficient queries and operations. Enable the profiler by setting the profile value using the following command in the mongo shell:
 db.setProfilingLevel(1)
The documentation of db.setProfilingLevel() for more information about this command.
You can view the profiler’s output by issuing the show profile command in the mongo shell, with the following operation.
db.system.profile.find( { millis : { $gt : 100 } } ).toArray()
[
        {
                "ts" : ISODate("2013-04-15T10:41:33.437Z"),
                "op" : "query",
                "ns" : "test.data",
                "query" : {
                        "value" : 1
                },
                "ntoreturn" : 0,
                "ntoskip" : 0,
                "nscanned" : 4000005,
                "keyUpdates" : 0,
                "numYield" : 0,
                "lockStats" : {
                        "timeLockedMicros" : {
                                "r" : NumberLong(3314599),
                                "w" : NumberLong(0)
                        },
                        "timeAcquiringMicros" : {
                                "r" : NumberLong(6),
                                "w" : NumberLong(8)
                        }
                },
                "nreturned" : 6,
                "responseLength" : 346,
                "millis" : 3314,
                "client" : "127.0.0.1",
                "user" : ""
        }
]




This returned all operations that lasted longer than 100 milliseconds. Ok, so we have found one slow query that took 3314 milliseconds. Lets look at query execution plan i.e. using Explain():
> db.data.find({value : 1}).explain()

{
        "cursor" : "BasicCursor",
        "isMultiKey" : false,
        "n" : 6,
        "nscannedObjects" : 4000005,
        "nscanned" : 4000005,
        "nscannedObjectsAllPlans" : 4000005,
        "nscannedAllPlans" : 4000005,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "millis" : 3617,
        "indexBounds" : {

        },
        "server" : "eugenius:27017"
}
- "millis" : 3617
- "nscanned" : 4000005

It shows that query scans over 4 million rows and would take 3.6 seconds to complete. If we add an index on field "value":

 db.data.ensureIndex({value : 1})

Let's look at Explain output again, so re-run query plan:
 > db.data.find({value : 1}).explain()
{
        "cursor" : "BtreeCursor value_1",
        "isMultiKey" : false,
        "n" : 6,
        "nscannedObjects" : 6,
        "nscanned" : 6,
        "nscannedObjectsAllPlans" : 6,
        "nscannedAllPlans" : 6,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "millis" : 1,
        "indexBounds" : {
                "value" : [
                        [
                                1,
                                1
                        ]
                ]
        },
        "server" : "eugenius:27017"
}
-"nscanned" : 6
-"millis" : 1
Query plan now looks good, as it processes only 6 documents and completes in 1 million second.

Friday, 12 April 2013

Mongo db stress test using different writeConcern

The aim is to determine how mongo performs using different writeConcern, which control the acknowledgment of write operations with various options, such as: ACKNOWLEDGED [default] UNACKNOWLEDGED JOURNALED JOURNALED_SAFE SAFE FSYNCED FSYNC_SAFE WriteConern options are documented here.
 Test results: Total 200k documents are injected into 'user' collection using 4, 8 and 16 parallel threads
 

UNACKNOWLEDGED option is the fastest of all but not safe, on other hand FSYNC_SAFE is the most safe option but is the slowest of all..

MongoDB - Adding data

Inserting data
Inserting individual documents:
> doc=({"Name" : "Definitive Guide to MongoDB"})
{ "Name" : "Definitive Guide to MongoDB" }
> db.books.insert(doc);
> doc=({"Name" : "Definitive Guide to PHP", "Author" : "AB"})
{ "Name" : "Definitive Guide to PHP", "Author" : "AB" }
> db.books.insert(doc);


Importing contents from file


The mongoimport tool provides a route to import content from a JSON, CSV, or TSV export created by mongoexport, or potentially, another third-party export tool


-- Add sample data into import.json file
$ time for cnt in {1..1000000}; do echo "{\"MongoDB\" : $cnt}">> import.json; done
 
real    0m35.735s
user    0m26.174s
sys     0m9.185s
 
-- Import contents of import.json file
$ time mongoimport --db test --collection samplecollection import.json
connected to: 127.0.0.1
Tue Mar 19 09:38:08             816586/15890088 5%
Tue Mar 19 09:38:08                     55200   18400/second
Tue Mar 19 09:38:11             1546087/15890088        9%
Tue Mar 19 09:38:11                     103600  17266/second
Tue Mar 19 09:38:14             2320487/15890088        14%
Tue Mar 19 09:38:14                     152000  16888/second
Tue Mar 19 09:38:17             3178087/15890088        20%
Tue Mar 19 09:38:17                     205600  17133/second
Tue Mar 19 09:38:20             3957287/15890088        24%
Tue Mar 19 09:38:20                     254300  16953/second
Tue Mar 19 09:38:23             4835687/15890088        30%
Tue Mar 19 09:38:23                     309200  17177/second
Tue Mar 19 09:38:26             5683687/15890088        35%
Tue Mar 19 09:38:26                     362200  17247/second
Tue Mar 19 09:38:29             6536487/15890088        41%
Tue Mar 19 09:38:29                     415500  17312/second
Tue Mar 19 09:38:32             7397287/15890088        46%
Tue Mar 19 09:38:32                     469300  17381/second
Tue Mar 19 09:38:35             8237287/15890088        51%
Tue Mar 19 09:38:35                     521800  17393/second
Tue Mar 19 09:38:38             9059687/15890088        57%
Tue Mar 19 09:38:38                     573200  17369/second
Tue Mar 19 09:38:41             9875687/15890088        62%
Tue Mar 19 09:38:41                     624200  17338/second
Tue Mar 19 09:38:44             10750887/15890088       67%
Tue Mar 19 09:38:44                     678900  17407/second
Tue Mar 19 09:38:47             11538087/15890088       72%
Tue Mar 19 09:38:47                     728100  17335/second
Tue Mar 19 09:38:50             12312487/15890088       77%
Tue Mar 19 09:38:50                     776500  17255/second
Tue Mar 19 09:38:53             13093287/15890088       82%
Tue Mar 19 09:38:53                     825300  17193/second
Tue Mar 19 09:38:56             13958887/15890088       87%
Tue Mar 19 09:38:56                     879400  17243/second
Tue Mar 19 09:38:59             14835687/15890088       93%
Tue Mar 19 09:38:59                     934200  17300/second
Tue Mar 19 09:39:02             15672487/15890088       98%
Tue Mar 19 09:39:02                     986500  17307/second
Tue Mar 19 09:39:02 imported 1000100 objects
 
real    0m57.955s
user    0m18.710s
sys     0m5.052s

Protecting MongoDB Server with Authentication


MongoDB supports a simple authentication model that allows the administrator to restrict access to databases on a per user basis.
MongoDB supports individual access control records on each database that are stored in a special system.users collection. For normal users to have access to two databases (e.g., db1 and db2), their credentials and rights must be added to both databases.
Note: Before you enable authentication on your server, make sure you have added your admin users to the admin database; if you enable authentication before adding the admin users, you will not be able to perform any actions on your database.

Adding an Admin User

Adding the admin user is as simple as changing to the admin database and using the addUser() command:
$ mongo
MongoDB shell version: 2.2.3
connecting to: test
> use admin
switched to db admin
> show collections
> db.addUser("admin","xxxxx")
{
        "user" : "admin",
        "readOnly" : false,
        "pwd" : "9f3121efccbe3fef09a799d5e63077c2",
        "_id" : ObjectId("5148475c7719ce8dd603e52d")
}
>
> exit
bye

Note: Any users added to the special admin database will have the same access rights on all databases

Enabling Authentication
Now you need to alter your server's configuration to enable authentication. Do so by stopping your server and adding --auth to the startup parameters.
$ grep auth /etc/mongod-rs0-1.conf
auth = true
 
$ sudo mongod --dbpath /mnt/mongodb/data/rs0-1-db --shutdown
killing process with pid: 26501
[1]+  Done                    sudo nohup mongod --config /etc/mongod-rs0-1.conf
Restart mongoDB:
$ sudo nohup mongod --config /etc/mongod-rs0-1.conf &
[1] 26982
[akhan2@eugenius ~]$ nohup: appending output to `nohup.out'

Authentication is done using db.auth():

> db.auth("admin","xxxxx");
1

Adding a Read-Only User

The addUser() function includes an additional parameter that allows you to create a user that has only read-only permissions. The MongoDB client will throw an exception if a process authenticated as the newly created user attempts to do anything that would result in a change to the contents of the database. The following example gives a user access to the database for status monitoring or reporting purposes:
First connect to mongo db as 'admin' user:
> use admin
switched to db admin
> db.auth("admin","123");
1
Grant Read-Only permission to user 'userA' on test db:
> use test
switched to db test
> db.addUser("readuser","xxxxx",true);
{
        "user" : "userA",
        "readOnly" : true,
        "pwd" : "e982ddebe2d61f96d42e78cd2033a69f",
        "_id" : ObjectId("51484a69ce8d584df56af233")
}

Test 'userA' authentication:

$ mongo
MongoDB shell version: 2.2.3
connecting to: test
> use test
switched to db test
> db.auth("userA","xxxx");
1
> db.akhan.find();
{ "_id" : ObjectId("514846197876222c62329f27"), "Ad" : 999919 }
{ "_id" : ObjectId("514846197876222c62329f28"), "Ad" : 999920 }

Deleting a User

To remove a user from a database, simply use the normal remove() function for a collection. The following example removes the user just added; note that you have to authenticate against the admin database before you can remove the user:
> use test
switched to db test
> db.system.users.remove({username:"userA"})

Thursday, 4 April 2013

Minimize MySQL replication lag

Replication lag occurs in many MySQL deployments. This often happens due to slow insert/update/delete query that cause the slave to fall behind. However, it could also happen due to high volume of writes to the master database. One of my clients (who use MySQL 5.1) reported replication lag on a slave host, there was no stuck/long query running to cause slave to fall behind. And MySQL process-list show that it's reading/applying events from relay log (i.e. from local copy of master binary logs):

| 2 | system user | | NULL | Connect | 779612 | Waiting for master to send event | NULL |
| 3 | system user | | NULL | Connect | 4215   | Reading event from the relay log | NULL |
And the replication latency continue to grow:  "Seconds_Behind_Master: 4234"
When we look at MySQL dashboard to check on database activity graph (see below), its doing only avg 170 inserts/second. 

MySQL config for replication slave hosts can be configured to behave differently, the benefit is to help assist with replication catch-up. In this scenario we changed the value of innodb_flush_at_trx_commit to 2 from default 1. After this change, significant increase in db activity was observed, replication lag started to reduce sharply, and replication caught up with MySQL database under 5 mins, see below db activity graph:


Thursday, 28 February 2013

MySQL 5.6 - Online DDL Operations

MySQL 5.6 supports several kinds of Online DDL operations on Innodb tables, so it is now possible to perform DML operations such as insert, update, delete and select operations on a table while DDL operation on the subject table is in progress. For more details I strongly advise you to read MySQL docs here and here. In this Blog I would demonstrate few quick examples:

a) Remove column from table 't1' -  DML operations can proceed while DDL is in progress i.e. ALTER TABLE .. DROP column..


mysql >show processlist;
+-----+------+-----------+------+---------+------+----------------+----------------------------------+
| Id  | User | Host      | db   | Command | Time | State          | Info                             |
+-----+------+-----------+------+---------+------+----------------+----------------------------------+
|  67 | root | localhost | test | Query   |    0 | init           | show processlist                 |
| 604 | root | localhost | test | Query   |    3 | altering table | alter table t1 drop column data2 |
+-----+------+-----------+------+---------+------+----------------+----------------------------------+
2 rows in set (0.00 sec)

mysql >insert into t1 values (null,'xxxx','yyyy');
Query OK, 1 row affected (0.00 sec)

mysql >select * from t1 where id=2;
+----+----------------------------------------------------+-------+
| id | data                                               | data2 |
+----+----------------------------------------------------+-------+
|  2 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa | a     |
+----+----------------------------------------------------+-------+
1 row in set (0.00 sec)

b) Add new column table 't1' -  DML operations can proceed as long as they do not access the new column that is being added to the table:


mysql>show processlist;
+-----+------+-----------+------+---------+------+----------------+---------------------------------------------------------+
| Id  | User | Host      | db   | Command | Time | State          | Info                                                    |
+-----+------+-----------+------+---------+------+----------------+---------------------------------------------------------+
|  67 | root | localhost | test | Query   |    0 | init           | show processlist                                        |
| 604 | root | localhost | test | Query   |   38 | altering table | alter table t1 add column data2 varchar(20) default 'a' |
+-----+------+-----------+------+---------+------+----------------+---------------------------------------------------------+
2 rows in set (0.00 sec)

mysql>insert into test.t1 values (null,'xxxx');
Query OK, 1 row affected (0.02 sec)

mysql>insert into test.t1 values (null,'xxxx','yyyy');
ERROR 1136 (21S01): Column count doesn't match value count at row 1