The Pagination and subquery

Sharding-JDBC supports the paging query of MySQL, PostgreSQL and Oracle. For SQLServer, due to its complex paging query, only partial queries can be executed.

The Paging performance

The performance bottleneck

The SQLs with excessive paging offset can result in the poor performance of the database. Take MySQL for example:

SELECT * FROM t_order ORDER BY id LIMIT 1000000, 10

This SQL makes MySQL to retrieve 10 records after skipping over 1000000 records, if no suitable index can be used. In the case of Sharding (assuming two databases for sharding), in order to ensure the result are correct, the SQL will be rewritten as:

SELECT * FROM t_order ORDER BY id LIMIT 0, 1000010

The records before the offset are taken out, and only the last 10 of all the records ordered are obtained. If the database itself is stressed, the SQL will further exacerbate the performance bottleneck, for the original SQL only needs to send 10 records to the client, but the rewritten SQL will send 1000010*2 to the client.

The optimization in Sharding-JDBC for pagination

Two of aspects is optimized in Sharding-JDBC. First, Sharding-JDBC uses streaming + merge sort to avoid excessive memory consumption. Sharding-JDBC uses the rewrote SQLs for the query, and necessarily takes up additional bandwidth, but does not cause the memory a sharp rise. Most of people think that Sharding-JDBC will load 1000010*2 records into memory, which will take up a lot of memory and cause a memory leak. However, because the records of each result set are ordered, sharding-jdbc only compares the current record of each sharding at a time, and only saves the cursor of the current record in memory. The time complexity of merge sort is only O(n), and the loss of performance is very small.

Second, sharding-jdbc also optimizes queries that are only routed to a single slice. The query routing to a single slice can ensure the result are correct without the SQL rewriting. Therefore, Sharding-JDBC will save the bandwidth by means of not rewriting SQLs.

A better solution of pagination

Since the LIMIT queries the data not using the index, a better solution is that you operate paging by using ID, if the ID is sequential:

SELECT * FROM t_order WHERE id > 100000 AND id <= 100010 ORDER BY id

Or you can query the next page of data by recording the ID of the last record of the last query:

SELECT * FROM t_order WHERE id > 100000 LIMIT 10

The paging-related subquery

The paging for both Oracle and SQLServer need to be processed through subquery, and Sharding-JDBC supports paging-related subquery。

Oracle

Sharding-JDBC supports to use rownum for paging:

SELECT * FROM (SELECT row_.*, rownum rownum_ FROM (SELECT o.order_id as order_id FROM t_order o JOIN t_order_item i ON o.order_id = i.order_id) row_ WHERE rownum <= ?) WHERE rownum > ?

At present, it does not supports rownum + BETWEEN for paging.

SQLServer

Sharding-JDBC supports TOP + ROW_NUMBER() OVER for paging:

SELECT * FROM (SELECT TOP (?) ROW_NUMBER() OVER (ORDER BY o.order_id DESC) AS rownum, * FROM t_order o) AS temp WHERE temp.rownum > ? ORDER BY temp.order_id

Sharding-JDBC also supports OFFSET FETCH in SQLServer 2012 or above for paging:

SELECT * FROM t_order o ORDER BY id OFFSET ? ROW FETCH NEXT ? ROWS ONLY

It does not support WITH xxx AS (SELECT …) or two TOP + subquery for paging. Because paging statements generated by Hibernate in SQLServer use the WITH statement, Sharding-JDBC currently does not support SQLServer paging based on Hibernate.

MySQL, PostgreSQL

Both MySQL and PostgreSQL support LIMIT for paging, and subquery is not needed:

SELECT * FROM t_order o ORDER BY id LIMIT ? OFFSET ?

Other subqueries

Sharding-JDBC also supports other subqueries similar with paging subquery. No matter how many layers in a subquery, Sharding-JDBC can always find the first subquery that contains the table data, once the sub-subquery containing table data is found in the lower nest, Sharding-JDBC will throw an exception.

For example, the following subquery is ok:

SELECT COUNT(*) FROM (SELECT * FROM t_order o)

But this subquery will cause exceptions:

SELECT COUNT(*) FROM (SELECT * FROM t_order o WHERE o.id IN (SELECT id FROM t_order WHERE status = ?))

Simply speaking, none-functional subqueries can be supported in most cases, such as pagination or statistics, etc, but subqueries for business is not supported currently.

In addition, subqueries containing aggregate functions are not supported currently due to merge constraints.