the_unbridled_goose May 18, 2016 at 16:00

Five Postgres Pagination Methods, From Basic to Outlandish

Transfer

You may be surprised by the fact that pagination, common as such in web applications, can easily be implemented irrationally. In this article we will try various methods of pagination on the server side and discuss their convenience when used in PostgreSQL. This article will help you understand which technique is more appropriate in your situation, including some that you may not have seen before, namely those that rely on physical clustering and a database statistics collector.

Before continuing, mention pagination on the application side. Some applications transfer all (or most) of the server information to the client and share it on the page there. For small amounts of data, application-side pagination may be a good choice, reducing the number of HTTP calls. This approach becomes impractical when records begin to number in the thousands. Pagination on the server side has the following advantages:

Faster start page loading
Higher accuracy when shared data changes
Faster operations on large amounts of data
Business Logic Encapsulation
Better Performance on Limited Resource Clients

PostgreSQL gives us a certain number of server-side pagination techniques that differ in speed, integrity (do not lose records), as well as support for access patterns for certain pages. Not all methods work in all situations; some require special data or queries. Consider the methods in general order, starting with those that work with any queries, and continuing with those that require ordered data. We end up with a few exotic methods that are based on the internal PostgreSQL device.

Splitting custom queries

Limit offset

The simplest pagination method, limit-offset, is also the most risky. Unfortunately, it is one of the foundations of web development tutorials. Object Relational Mapping (ORM) libraries make using this method easy and attractive, from SQLAlchemy .slice (1, 3) to ActiveRecord .limit (1) .offset (3) and to Sequelize .findAll ({ offset: 3, limit: 1}) . It is no coincidence that limit-offset is used everywhere, you can attach it to any request without further modification.

ORM methods of limit and offset are one thing, auxiliary libraries for pagination can be even more deceiving. For example, the popular Kaminari Ruby library uses limit-offset by default, hiding it behind a high-level interface.

This technique has two big problems, the inconsistency of the result and the inefficiency of the bias. Consistency is due to the fact that passing through the results should receive each element strictly once, without omissions or repetitions. The bias inefficiency is associated with the delay that occurs when shifting the results to a large bias.

Here's how limit-offset pagination can be inconsistent. Suppose a user goes from page n to page n + 1until at the same moment a new element is inserted into page n . This will cause both duplication (the last element from page n is extruded to page n + 1 ) and a skip (new element). Alternatively, suppose the element n is deleted at the moment the user navigates to page n + 1 . The preloaded start element of page n + 1 will shift to page n and will be skipped.

Now on the topic of inefficiency. Big shifts are really expensive. Even if there is an index, the database will have to scan the entire storage, counting the rows. To use the index, we must filter the column by value, but in this case we need a certain number of rows, regardless of their column values. In addition, the rows are not required to have the same size during storage, and some of them may be present on the disk, but be marked as deleted, so the database cannot use simple arithmetic to find the disk space and start reading the results. Let's measure the slowdown.

-- Создаем таблицу со случайными строками меняющегося размера
CREATE TABLE medley AS
  SELECT
    generate_series(1,10000000) AS n,
    substr(concat(md5(random()::text), md5(random()::text)), 1, (random() * 64)::integer + 1) AS description;
-- Оповещаем планировщик о кардинально изменившемся размере таблицы
VACUUM ANALYZE;
-- Малые сдвиги освежающе быстры
EXPLAIN ANALYZE SELECT * FROM medley LIMIT 100;

Estimated cost is quite low:

                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..1.85 rows=100 width=38) (actual time=0.008..0.036 rows=100 loops=1)
   ->  Seq Scan on medley  (cost=0.00..185460.60 rows=9999660 width=38) (actual time=0.007..0.017 rows=100 loops=1)
 Planning time: 0.040 ms
 Execution time: 0.059 ms
(4 rows)

The choice of offset = 1000, changes its cost to 19 and the runtime to 0.609 ms. As soon as offset = 5000000, the cost is already 92734 and the execution time is 758.484 ms.

These problems do not necessarily mean that the limit-offset method is not applicable in your case. In some applications, users usually do not go through many pages in the results, and you can even use the page limit on the server side. If data inconsistency and page limit is not a problem in your application, then the limit-offset method is quite suitable for you.

When to use: Limit-Offset. Applications with limited pagination depth and tolerance of inconsistency.

Cursors

Despite its shortcomings, the limit-offset method has a plus in the form of no effect on the server. In contrast to this approach, there is another method of pagination, cursors. Like bias, cursors can be used with any requests, but they differ in that they require a separate connection from the server and transactions through the HTTP client.

Here's how cursors can be used:

-- Мы должны находиться в транзакции
BEGIN;
-- Открываем курсор для запроса
DECLARE medley_cur CURSOR FOR SELECT * FROM medley;
-- Получаем 10 строк
FETCH 10 FROM medley_cur;
-- ...
-- Получаем еще 10 строк, с того места, где остановились в прошлый раз
FETCH 10 FROM medley_cur;
-- Все готово
COMMIT;

Cursors have the desired property of consistency of pagination on any queries, showing the results that exist in the database at the time the transaction begins. The transaction isolation level ensures that the paginated result does not change.

Each pagination approach has its weaknesses, and cursors are no exception: they are dependent on resource usage and client-server bundles. Each open transaction consumes the allocated resources of the database, and does not scale for a large number of clients. Of course, there are WITH HOLD cursors that can exist outside the transaction, but they must materialize the data. Thus, these errors make cursor pagination suitable only for a narrow range of situations, for example, for an internal network.

Adding HTTP communication to cursors is a complication. Servers must identify clients between requests, no matter through a token, or by storing an identifier, such as the client’s IP address in a session. Servers must also decide when to free transactions due to their downtime. Finally, balancing the load on the server becomes difficult, as the client must connect to a specific server each time.

When to use: Cursors. An application within the network, on a single server, which should divide requests into pages with variable and variable order, especially when the consistency of the result is important.

Pagination of ordered queries

Key Pagination

The techniques listed above can paginate any type of query results, including unordered queries. If we are ready to abandon this community, then we can reap the benefits of optimization. In particular, when sorting by indexed columns, the user can use the values from the current page to select which objects to display on the next page. This is called key set pagination.

For example, let's return to our example:

-- Добавляем индекс для пагинации (btrees поддерживают неравенство)
CREATE INDEX n_idx ON medley USING btree (n);
SELECT * FROM medley ORDER BY n ASC LIMIT 5;

With my random data, it returns:

 n |                         description
---+-------------------------------------------------------------
 1 | 74f70e009396
 2 | 8dac5a085eb670a29058d
 3 | fce303a32e89181bf5df1601487
 4 | fddcced2c12e83516b3bd6cc94f23a012dfd
 5 | f51ae548dd27f51147e53e839eeceb6b0c92922145276d668e73d4a6621
(5 rows)

Now the user can look at the maximum n from the result and use it to call the next page:

SELECT * FROM medley
 WHERE n > 5
 ORDER BY n ASC
  LIMIT 5;

Even when filtering with n> 5000000, this works faster than in the limit-offset example.

                                                           QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..0.62 rows=5 width=38) (actual time=0.101..0.103 rows=5 loops=1)
   ->  Index Scan using n_idx on medley  (cost=0.43..185579.42 rows=5013485 width=38) (actual time=0.100..0.102 rows=5 loops=1)
         Index Cond: (n > 5000000)
 Planning time: 0.071 ms
 Execution time: 0.119 ms
(5 rows)

This pagination is fast and ensures data integrity. Any additions / deletions to the current page will leave the result unchanged. Two weaknesses of this method are the lack of random access and possible connections between the client and server.

In general, there is no way to navigate to the selected page without visiting the previous ones to determine their maximum elements. Under certain conditions, however, we can do better. If the values in the indexed column are evenly distributed (or even better, adjacent numbers without spaces), the user can perform some mathematical calculations to find the page of interest, because the index makes finding the largest value cheap:

EXPLAIN ANALYZE SELECT max(n) FROM medley;
                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Result  (cost=0.46..0.47 rows=1 width=0) (actual time=0.021..0.021 rows=1 loops=1)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.43..0.46 rows=1 width=4) (actual time=0.018..0.018 rows=1 loops=1)
           ->  Index Only Scan Backward using n_idx on medley  (cost=0.43..284688.43 rows=10000000 width=4) (actual time=0.017..0.017 rows=1 loops=1)
                 Index Cond: (n IS NOT NULL)
                 Heap Fetches: 0
 Planning time: 0.087 ms
 Execution time: 0.042 ms
(8 rows)

Another question of pagination by key values, the client / server connection, needs attention. Initially, the user does not know which columns are indexed. The server is likely to provide an endpoint with a fixed result, rather than allowing the client to reorder. The code provided to the client may not know which column is ordered, the server should provide a hint on how to request the next page. RFC5988 defines the relationship of the previous and next HTTP links to encode the links for the user to follow.

Since users typically access pages with information in a linear form, key-value pagination is usually preferred for paginating records on heavily loaded web servers.

When to use: Pagination by key values. Scalable applications serving data sequentially from column (s) indexed for comparison. Supports filtering.

Outlandish, specialized pagination

Clustered TID Scan

We can get custom pagination methods for special situations using low-level PostgreSQL functions. For example, we can get really random access to data if we

Do not require pages to be the same size
We only support single order for paginated strings

The trick is to select returned pages that are linked to pages from a database on disk, or to specific parts of those pages on disk. Each table in the PostgreSQL database contains a secret column called ctid and identifies its row:

SELECT ctid, * FROM medley WHERE n <= 10;
  ctid  | n  |                         description
--------+----+-------------------------------------------------------------
 (0,1)  |  1 | 74f70e009396
 (0,2)  |  2 | 8dac5a085eb670a29058d
 (0,3)  |  3 | fce303a32e89181bf5df1601487
 (0,4)  |  4 | fddcced2c12e83516b3bd6cc94f23a012dfd
 (0,5)  |  5 | f51ae548dd27f51147e53e839eeceb6b0c92922145276d668e73d4a6621
 (0,6)  |  6 | eb9fe1dfe1e421903f96b3b5c5dfe1ee1253582d728c35b4ee7330b
 (0,7)  |  7 | e95202d7f5c612f8523ae705d
 (0,8)  |  8 | 6573b64aff262a2b940326
 (0,9)  |  9 | a0a43
 (0,10) | 10 | 82cdc134bd249a612cfddd3088dd09e32de5f4fa33
(10 rows)

Each ctid is as follows: (page, line). PostgreSQL can get rows very quickly by ctid, in fact, this is how indexes work - they bind column values to ctid.

It should be noted that although PostgreSQL defines an order relation based on a tid type, it cannot efficiently obtain ctids from the inequality

EXPLAIN ANALYZE SELECT count(1) FROM medley WHERE ctid >= '(0,1)'::tid AND ctid < '(1,0)'::tid;
                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=235589.00..235589.01 rows=1 width=0) (actual time=1241.851..1241.852 rows=1 loops=1)
   ->  Seq Scan on medley  (cost=0.00..235464.00 rows=50000 width=0) (actual time=477.933..1241.802 rows=116 loops=1)
         Filter: ((ctid >= '(0,1)'::tid) AND (ctid < '(1,0)'::tid))
         Rows Removed by Filter: 9999884
 Planning time: 0.047 ms
 Execution time: 1241.889 ms
(6 rows)

Ranging does not work, but there is still a way to efficiently query all rows from a page on disk. Each page contains current_setting ('block_size') data bytes (usually 8k). Lines are linked by a 32-bit pointer, so for the most part, block_size / 4 lines per page are necessary. (In fact, the lines are usually wider than the minimum size and a quarter of the block size provides the upper border of the lines on the page.) The following sequence will generate all possible ctids on the j-th page

SELECT ('(' || j || ',' || s.i || ')')::tid
 FROM generate_series(0,current_setting('block_size')::int/4) AS s(i);

Let's use it to get all the lines in our example on the zero page.

SELECT * FROM medley WHERE ctid = ANY (ARRAY
  (SELECT ('(0,' || s.i || ')')::tid
    FROM generate_series(0,current_setting('block_size')::int/4) AS s(i)
  )
);

The scheduler determined the cost of executing this request equal to cost = 25.03..65.12 and executes it in 2.765ms. Requesting page number 10000 has the same cost. Thus we get really random access, why not love it?

There are three bottlenecks:

When lines are deleted, they leave holes in the pages.
The order of the lines cannot be significant. The database inserts rows into holes left from deleting rows, which will result in rows falling out of sequence.
Where is not supported

In some situations, this is not a problem. One case is data, the natural order of which correlates with the order of adding to the database, such as only incremental data of time intervals. Another is data that does not change often. This is due to the fact that we have control over the layout of lines on pages through the CLUSTER command.

Let's get back to our example. Its lines on the disk are ordered by n column in ascending order, since this is the order in which we added them to the database. But what if we want to sort them by the description column? To do this, we will have to physically rebuild the table by creating an index on the description column and clustering.

CREATE INDEX description_idx ON medley USING btree (description);
CLUSTER medley USING description_idx;

Now fetching all the rows from the first page returns us data sorted alphabetically by description column. If the table changes, the new rows will fall out of the alphabetical list, but for now the table is unchanged - the returned objects will be in perfect order. In addition, it can be periodically re-clustered after changes, despite the fact that this operation blocks the table and cannot be performed when people need access to it.

Finally, you can determine the total number of pages for a table using its total size in bytes.

SELECT pg_relation_size('medley') / current_setting('block_size')::int;

When to use: TID Scan. When fast random access is required and filtering is not required. It works especially well with only increasing time data, with practically unchanged line width.

Keyset with Evaluation Tabs

As we have seen, regular pagination on key sets does not allow you to navigate to specific pages, except in cases with user guesses. However, the PostgreSQL statistics collector does support column distribution histograms. We can use these estimates in combination with limitations and small offsets to get fast random access pagination through a hybrid approach.

First, let's take a look at the statistics in our example:

SELECT array_length(histogram_bounds, 1) - 1
  FROM pg_stats
 WHERE tablename = 'medley'
   AND attname = 'n';

In my database, column n has 101 boundary exponents, i.e. 100 ranges between them. Specific values are not too crowded, because the data is evenly distributed.

{719,103188,193973,288794, … ,9690475,9791775,9905770,9999847}

Please note that the figures are approximate. The first number is not exactly 0, and the last is not exactly ten million. Ranges divide our information into a block of size B = 10,000,000 / 100 = 100,000 lines.

We can use the histogram ranges of the PostgreSQL statistics collector to obtain probabilistically correct pages. If we choose the page size on the client side equal to W, then how do we get the ith page? It will be located in the IW / B block, with an offset of IW% B.

Choosing W = 20, let's request page 270000 from our test table.

WITH bookmark AS (
    SELECT (histogram_bounds::text::int[])[((270000 * 20) / 100000)+1] AS start,
           (histogram_bounds::text::int[])[((270000 * 20) / 100000)+2] AS stop
    FROM pg_stats
    WHERE tablename = 'medley'
    AND attname = 'n'
    LIMIT 1
  )
SELECT *
FROM medley
WHERE n >= (select start from bookmark)
AND n < (select stop from bookmark)
ORDER BY n ASC
LIMIT 20
OFFSET ((270000 * 20) % 100000);

This is superfast (note that the shift occurs in a time close to zero in this case). The query returns rows from n = 5407259 to 5407278. The true values on page 270,000 are equal from n = 5400001 to 5400020. The miss is 7239, or about 0.1%.

We were lucky with the choice of page in this case. For contrast, page 74999 requires an offset of 99980. We know that our offset will be no more than 100,000. The upper bound is under our control if we want to compromise. By setting up the PostgreSQL statistics collector, we can get a more accurate column histogram.

ALTER TABLE medley ALTER COLUMN n SET statistics 1000;
VACUUM ANALYZE;

Now we have 1000, instead of 100 histogram values. In my database, it looks like this:

{10,10230,20863, …, 9980444,9989948,9999995}

At the same time, the step of our shift will be no more than 10000. The trade-off is that the scheduler now looks at more values, but slows down. So this is a cross between bias inefficiency and statistics collector overhead.

This hybrid “keyset / offset” method is probably not suitable for many real-life pagination applications. And here the “where” condition will not work. In addition, it is inaccurate and becomes only more and more inaccurate when changing the table and if the statistics collector has not been launched for a long time.

When to use: A set of keys with evaluation bookmarks. When the user needs deep, but approximate random access, without any additional filtering.

conclusions

Like many other engineering solutions, choosing a pagination technique requires compromises. We can say with confidence that key set pagination is most applicable for a medium site with ordered linear access. However, even the limit / offset method has its own strengths, and more exotic methods provide special performance characteristics for certain types of data. As you can see, there are quite a few possibilities. Choose the right tool for the job and don't let pagination be a closed book.

Tags: