LIMIT a, b is O(N*N)

By Mark Callaghan on Sunday, December 13, 2009 at 8:48am

I need to hack MySQL to stop parsing LIMIT a, b. This is a high-risk feature for an OLTP RDBMS. I saw the following on a server that was already too busy.


select ... from foo where ... LIMIT 2000000, 10;
select ... from foo where ... LIMIT 2000010, 10;
select ... from foo where ... LIMIT 2000020, 10;

I hope this didn't start at LIMIT 0, 30 as that is a slow way to fetch rows. Each of the queries above scans ~2M rows from an index before returning the last 30 rows. Queries like this have a lousy response time and make mysqld do too much work.

There is a way to make this query efficient but first I will explain why LIMIT a, b is O(N*N). It has been a long time since I took an algorithms course, so I apologize in advance for incorrect terminology. This assumes an index exists for the columns in the ORDER BY clause. When a sequence of LIMIT a, b queries are used to fetch N rows returning B rows per query, then the number of rows processed by mysqld internally is O(N*N). Assume that 300 rows are fetched 10 rows at a time using:

LIMIT 0, 10
LIMIT 0, 20
...
LIMIT 280, 10
LIMIT 290, 10

This requires 30 queries and the number of rows scanned internally is:

10 + 20 + ... + 290 + 300
... or
310 * 15
... or
(300 + 10) * (300 / 10 / 2)
... or
((N+B) * (N / B / 2) when N=300 and B=10
... or
((N*N) / B / 2) + (N/2)

Queries like this can be efficient when using the PRIMARY index or a secondary index that indexes all columns from the PRIMARY index. Indexing all columns from the PRIMARY index is not as horrible as you might think when using InnoDB because all secondary index entries contain the PRIMARY key columns as the row pointer. So the request is to move those columns from the non-indexed part of the index entry to the indexed part of the index entry. An example of how to efficiently restart index scans using this technique is described in http://code.google.com/p/google-mysql-tools/wiki/OnlineDataDrift.

The slow way to do this:

create table foo (
i int primary key,
j int,
key (j)
) engine=innodb;

select j, i from foo order by j desc LIMIT $offset, 10;

The fast way to do this:

create table foo (
i int primary key,
j int,
key (j, i)
) engine=innodb;

select j, i from foo where j = $last_j and i < $last_i or j < $last_j order by j desc, i desc LIMIT 10;

But don't take my word for it. Learn to use show session status like "Handler_read%" to understand the work done by a query.