Not very big data

    This article will discuss the features provided by built-in or declarative partitioning in version 12 of PostgreSQL. The demonstration was prepared for the report of the same name at the HighLoad ++ Siberia 2019 conference (upd: a video with the report appeared).

    All examples are executed on the recently appeared beta version:

    => SELECT version();
    
                                                         version                                                      
    ------------------------------------------------------------------------------------------------------------------
     PostgreSQL 12beta1 on i686-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609, 32-bit
    (1 row)

    The examples use the bookings and tickets tables of the demo database. The reservation table contains entries for three months from June to August 2017 and has the following structure:

    => \d bookings
    
                            Table "bookings.bookings"
        Column    |           Type           | Collation | Nullable | Default 
    --------------+--------------------------+-----------+----------+---------
     book_ref     | character(6)             |           | not null | 
     book_date    | timestamp with time zone |           | not null | 
     total_amount | numeric(10,2)            |           | not null | 
    Indexes:
        "bookings_pkey" PRIMARY KEY, btree (book_ref)
    Referenced by:
        TABLE "tickets" CONSTRAINT "tickets_book_ref_fkey" FOREIGN KEY (book_ref) REFERENCES bookings(book_ref)
    

    A reservation may include several tickets. The structure of the table with tickets:

    => \d tickets
                            Table "bookings.tickets"
         Column     |         Type          | Collation | Nullable | Default 
    ----------------+-----------------------+-----------+----------+---------
     ticket_no      | character(13)         |           | not null | 
     book_ref       | character(6)          |           | not null | 
     passenger_id   | character varying(20) |           | not null | 
     passenger_name | text                  |           | not null | 
     contact_data   | jsonb                 |           |          | 
    Indexes:
        "tickets_pkey" PRIMARY KEY, btree (ticket_no)
    Foreign-key constraints:
        "tickets_book_ref_fkey" FOREIGN KEY (book_ref) REFERENCES bookings(book_ref)
    Referenced by:
        TABLE "ticket_flights" CONSTRAINT "ticket_flights_ticket_no_fkey" FOREIGN KEY (ticket_no) REFERENCES tickets(ticket_no)
    

    This information should be enough to understand examples in which we will try to make tables partitioned.

    → Learn more about the demo base here.

    Range Partitioning


    First, try to make the bookings table partitioned by date range. In this case, the table would be created like this:

    => CREATE TABLE bookings_range (
           book_ref     character(6),
           book_date    timestamptz,
           total_amount numeric(10,2)
       ) PARTITION BY RANGE(book_date);
    

    Separate sections for each month:

    => CREATE TABLE bookings_range_201706 PARTITION OF bookings_range 
           FOR VALUES FROM ('2017-06-01'::timestamptz) TO ('2017-07-01'::timestamptz);
    => CREATE TABLE bookings_range_201707 PARTITION OF bookings_range 
           FOR VALUES FROM ('2017-07-01'::timestamptz) TO ('2017-08-01'::timestamptz);
    

    To indicate the boundaries of a section, you can use not only constants, but also expressions, for example, a function call. The value of the expression is calculated at the time the section is created and stored in the system directory:

    => CREATE TABLE bookings_range_201708 PARTITION OF bookings_range 
           FOR VALUES FROM (to_timestamp('01.08.2017','DD.MM.YYYY')) 
                        TO (to_timestamp('01.09.2017','DD.MM.YYYY'));
    

    Description of the table:

    => \d+ bookings_range
    
                                       Partitioned table "bookings.bookings_range"
        Column    |           Type           | Collation | Nullable | Default | Storage  | Stats target | Description 
    --------------+--------------------------+-----------+----------+---------+----------+--------------+-------------
     book_ref     | character(6)             |           |          |         | extended |              | 
     book_date    | timestamp with time zone |           |          |         | plain    |              | 
     total_amount | numeric(10,2)            |           |          |         | main     |              | 
    Partition key: RANGE (book_date)
    Partitions: bookings_range_201706 FOR VALUES FROM ('2017-06-01 00:00:00+03') TO ('2017-07-01 00:00:00+03'),
                bookings_range_201707 FOR VALUES FROM ('2017-07-01 00:00:00+03') TO ('2017-08-01 00:00:00+03'),
                bookings_range_201708 FOR VALUES FROM ('2017-08-01 00:00:00+03') TO ('2017-09-01 00:00:00+03')

    That’s enough. No trigger for inserting records; no CHECK constraints needed. The CONSTRAINT_EXCLUSION parameter is also not needed, you can even turn it off:

    => SET constraint_exclusion = OFF;
    

    Filling with automatic layout in sections:

    => INSERT INTO bookings_range SELECT * FROM bookings;
    INSERT 0 262788

    The declarative syntax still hides inherited tables, so you can see the distribution of strings in sections by query:

    => SELECT tableoid::regclass, count(*) FROM bookings_range GROUP BY tableoid;
           tableoid        | count  
    -----------------------+--------
     bookings_range_201706 |   7303
     bookings_range_201707 | 167062
     bookings_range_201708 |  88423
    (3 rows)

    But there is no data in the parent table:

    => SELECT * FROM ONLY bookings_range;
    
     book_ref | book_date | total_amount 
    ----------+-----------+--------------
    (0 rows)

    Check the exclusion of sections in the query plan:

    => EXPLAIN (COSTS OFF) 
       SELECT * FROM bookings_range WHERE book_date = '2017-07-01'::timestamptz;
    
                                     QUERY PLAN                                 
    ----------------------------------------------------------------------------
     Seq Scan on bookings_range_201707
       Filter: (book_date = '2017-07-01 00:00:00+03'::timestamp with time zone)
    (2 rows)

    Scanning only one section, as expected.

    The following example uses the to_timestamp function with the variability category STABLE instead of a constant:

    => EXPLAIN (COSTS OFF) 
       SELECT * FROM bookings_range WHERE book_date = to_timestamp('01.07.2017','DD.MM.YYYY');
    
                                         QUERY PLAN                                     
    ------------------------------------------------------------------------------------
     Append
       Subplans Removed: 2
       ->  Seq Scan on bookings_range_201707
             Filter: (book_date = to_timestamp('01.07.2017'::text, 'DD.MM.YYYY'::text))
    (4 rows)

    The value of the function is calculated when the query plan is initialized, and part of the sections are excluded from viewing (Subplans Removed line).

    But this only works for SELECT. When changing data, section exclusion based on STABLE function values ​​has not yet been implemented:

    => EXPLAIN (COSTS OFF) 
       DELETE FROM bookings_range WHERE book_date = to_timestamp('01.07.2017','DD.MM.YYYY');
    
                                         QUERY PLAN                                     
    ------------------------------------------------------------------------------------
     Delete on bookings_range
       Delete on bookings_range_201706
       Delete on bookings_range_201707
       Delete on bookings_range_201708
       ->  Seq Scan on bookings_range_201706
             Filter: (book_date = to_timestamp('01.07.2017'::text, 'DD.MM.YYYY'::text))
       ->  Seq Scan on bookings_range_201707
             Filter: (book_date = to_timestamp('01.07.2017'::text, 'DD.MM.YYYY'::text))
       ->  Seq Scan on bookings_range_201708
             Filter: (book_date = to_timestamp('01.07.2017'::text, 'DD.MM.YYYY'::text))
    (10 rows)

    Therefore, you should use constants:

    => EXPLAIN (COSTS OFF) 
       DELETE FROM bookings_range WHERE book_date = '2017-07-01'::timestamptz;
    
                                        QUERY PLAN                                    
    ----------------------------------------------------------------------------------
     Delete on bookings_range
       Delete on bookings_range_201707
       ->  Seq Scan on bookings_range_201707
             Filter: (book_date = '2017-07-01 00:00:00+03'::timestamp with time zone)
    (4 rows)

    Index sorting


    To perform the following query, sorting the results obtained from different sections is required. Therefore, in the query plan, we see the SORT node and the high initial cost of the plan:

    => EXPLAIN SELECT * FROM bookings_range ORDER BY book_date;
    
                                            QUERY PLAN                                        
    ------------------------------------------------------------------------------------------
     Sort  (cost=24649.77..25077.15 rows=170952 width=52)
       Sort Key: bookings_range_201706.book_date
       ->  Append  (cost=0.00..4240.28 rows=170952 width=52)
             ->  Seq Scan on bookings_range_201706  (cost=0.00..94.94 rows=4794 width=52)
             ->  Seq Scan on bookings_range_201707  (cost=0.00..2151.30 rows=108630 width=52)
             ->  Seq Scan on bookings_range_201708  (cost=0.00..1139.28 rows=57528 width=52)
    (6 rows)

    Create an index on book_date. Instead of a single global index, indexes are created in each section:

    => CREATE INDEX book_date_idx ON bookings_range(book_date);
    

    => \di bookings_range*
                                        List of relations
      Schema  |                Name                 | Type  |  Owner  |         Table         
    ----------+-------------------------------------+-------+---------+-----------------------
     bookings | bookings_range_201706_book_date_idx | index | student | bookings_range_201706
     bookings | bookings_range_201707_book_date_idx | index | student | bookings_range_201707
     bookings | bookings_range_201708_book_date_idx | index | student | bookings_range_201708
    (3 rows)

    The previous query with sorting can now use the index on the partition key and return the result from different sections immediately in sorted form. The SORT node is not needed and the minimum cost is required to produce the first row of the result:

    => EXPLAIN SELECT * FROM bookings_range ORDER BY book_date;
    
                                                               QUERY PLAN                                                           
    --------------------------------------------------------------------------------------------------------------------------------
     Append  (cost=1.12..14880.88 rows=262788 width=52)
       ->  Index Scan using bookings_range_201706_book_date_idx on bookings_range_201706  (cost=0.28..385.83 rows=7303 width=52)
       ->  Index Scan using bookings_range_201707_book_date_idx on bookings_range_201707  (cost=0.42..8614.35 rows=167062 width=52)
       ->  Index Scan using bookings_range_201708_book_date_idx on bookings_range_201708  (cost=0.42..4566.76 rows=88423 width=52)
    (4 rows)

    Partitioned indexes created in this way are supported centrally. When adding a new section, an index will be automatically created on it. And you can’t remove the index of only one section:

    => DROP INDEX bookings_range_201706_book_date_idx;
    
    ERROR:  cannot drop index bookings_range_201706_book_date_idx because index book_date_idx requires it
    HINT:  You can drop index book_date_idx instead.

    Only in whole:

    => DROP INDEX book_date_idx;
    
    DROP INDEX

    CREATE INDEX ... CONCURRENTLY


    When creating an index on a partitioned table, you cannot specify CONCURRENTLY.

    But you can do the following. First, we create the index only on the main table, it will receive the invalid status:

    => CREATE INDEX book_date_idx ON ONLY bookings_range(book_date);
    

    => SELECT indisvalid FROM pg_index WHERE indexrelid::regclass::text = 'book_date_idx';
    
     indisvalid 
    ------------
     f
    (1 row)
    

    Then create indexes on all sections with the CONCURRENTLY option:

    => CREATE INDEX CONCURRENTLY book_date_201706_idx ON bookings_range_201706 (book_date);
    => CREATE INDEX CONCURRENTLY book_date_201707_idx ON bookings_range_201707 (book_date);
    => CREATE INDEX CONCURRENTLY book_date_201708_idx ON bookings_range_201708 (book_date);
    

    Now we connect local indexes to global:

    => ALTER INDEX book_date_idx ATTACH PARTITION book_date_201706_idx;
    => ALTER INDEX book_date_idx ATTACH PARTITION book_date_201707_idx;
    => ALTER INDEX book_date_idx ATTACH PARTITION book_date_201708_idx;
    

    This is similar to connecting partition tables, which we will look at a bit later. As soon as all index sections are connected, the main index will change its status:

    => SELECT indisvalid FROM pg_index WHERE indexrelid::regclass::text = 'book_date_idx';
    
     indisvalid 
    ------------
     t
    (1 row)

    Connect and disconnect sections


    Automatic creation of sections is not provided. Therefore, they must be created in advance before records with new values ​​of the partitioning key are added to the table.

    We will create a new section while other transactions are working with the table, at the same time let's look at the locks:

    => BEGIN;
    => SELECT count(*) FROM bookings_range
        WHERE book_date = to_timestamp('01.07.2017','DD.MM.YYYY');
    
     count 
    -------
         5
    (1 row)
    
    => SELECT relation::regclass::text, mode FROM pg_locks 
        WHERE pid = pg_backend_pid() AND relation::regclass::text LIKE 'bookings%';
    
           relation        |      mode       
    -----------------------+-----------------
     bookings_range_201708 | AccessShareLock
     bookings_range_201707 | AccessShareLock
     bookings_range_201706 | AccessShareLock
     bookings_range        | AccessShareLock
    (4 rows)

    An AccessShareLock lock is imposed on the main table, all sections and indexes at the beginning of the statement. The calculation of the to_timestamp function and the exclusion of sections occurs later. If a constant were used instead of a function, only the main table and the bookings_range_201707 section would be locked. Therefore, if possible, specify constants in the request - this should be done, otherwise the number of lines in pg_locks will increase in proportion to the number of sections, which may lead to the need to increase max_locks_per_transaction.

    Without completing the previous transaction, create the following section for September in a new session:

        || => CREATE TABLE bookings_range_201709 (LIKE bookings_range);
        || => BEGIN;
        || => ALTER TABLE bookings_range ATTACH PARTITION bookings_range_201709
              FOR VALUES FROM ('2017-09-01'::timestamptz) TO ('2017-10-01'::timestamptz);
        || => SELECT relation::regclass::text, mode FROM pg_locks 
              WHERE pid = pg_backend_pid() AND relation::regclass::text LIKE 'bookings%';
    
                  relation               |           mode           
    -------------------------------------+--------------------------
     bookings_range_201709_book_date_idx | AccessExclusiveLock
     bookings_range                      | ShareUpdateExclusiveLock
     bookings_range_201709               | ShareLock
     bookings_range_201709               | AccessExclusiveLock
    (4 rows)
    

    When creating a new section, the ShareUpdateExclusiveLock lock, compatible with AccessShareLock, is imposed on the main table. Therefore, the operations of adding partitions do not conflict with queries against a partitioned table.

    => COMMIT;

        || => COMMIT;

    Partitioning is done with the ALTER TABLE ... DETACH PARTITION command. The section itself is not deleted, but becomes an independent table. Data can be downloaded from it, it can be deleted, and if necessary reconnected (ATTACH PARTITION).

    Another option to disable is to delete the section with the DROP TABLE command.

    Unfortunately, both options, DROP TABLE and DETACH PARTITION, use AccessExclusiveLock lock on the main table.

    Default section


    If you try to add a record for which a section has not yet been created, an error will occur. If this behavior is not desired, you can create a default section:

    => CREATE TABLE bookings_range_default PARTITION OF bookings_range DEFAULT;

    Suppose that when adding records, they mixed up the date without specifying a millennium:

    => INSERT INTO bookings_range VALUES('XX0000', '0017-09-01'::timestamptz, 0) 
       RETURNING tableoid::regclass, *;
    
            tableoid        | book_ref |          book_date           | total_amount 
    ------------------------+----------+------------------------------+--------------
     bookings_range_default | XX0000   | 0017-09-01 00:00:00+02:30:17 |         0.00
    (1 row)
    INSERT 0 1

    We note that the phrase RETURNING returns a new line, which falls into the default section.

    After setting the current date (changing the partition key), the record automatically moves to the desired section, triggers are not needed:

    => UPDATE bookings_range SET book_date = '2017-09-01'::timestamptz WHERE book_ref = 'XX0000' 
       RETURNING tableoid::regclass, *;
    
           tableoid        | book_ref |       book_date        | total_amount 
    -----------------------+----------+------------------------+--------------
     bookings_range_201709 | XX0000   | 2017-09-01 00:00:00+03 |         0.00
    (1 row)
    UPDATE 1

    Value List Sectioning


    In the demo database, the book_ref column must be the primary key of the bookings table. However, the selected partitioning scheme does not allow creating such a key:

    => ALTER TABLE bookings_range ADD PRIMARY KEY(book_ref);
    
    ERROR:  insufficient columns in PRIMARY KEY constraint definition
    DETAIL:  PRIMARY KEY constraint on table "bookings_range" lacks column "book_date" which is part of the partition key.

    The partitioning key must be included in the primary key.

    To break down by months and still include book_ref in the primary key, let's try another scheme for partitioning the bookings table - according to the list of values. To do this, add the redundant column book_month as the partition key:

    => CREATE TABLE bookings_list (
           book_ref     character(6),
           book_month   character(6),
           book_date    timestamptz NOT NULL,
           total_amount numeric(10,2),
           PRIMARY KEY (book_ref, book_month)
       ) PARTITION BY LIST(book_month);

    We will form sections dynamically based on the bookings table data:

    => WITH dates AS (
           SELECT date_trunc('month',min(book_date)) min_date, 
                  date_trunc('month',max(book_date)) max_date 
             FROM bookings
       ), partition AS (
           SELECT to_char(g.month, 'YYYYMM') AS book_month 
             FROM dates, 
                  generate_series(dates.min_date, dates.max_date, '1 month'::interval) AS g(month)
       ) 
       SELECT format('CREATE TABLE %I PARTITION OF bookings_list FOR VALUES IN (%L)',
                      'bookings_list_' || partition.book_month, partition.book_month) 
         FROM partition\gexec
    
    CREATE TABLE
    CREATE TABLE
    CREATE TABLE

    Here's what happened:

    => \d+ bookings_list
    
                                        Partitioned table "bookings.bookings_list"
        Column    |           Type           | Collation | Nullable | Default | Storage  | Stats target | Description 
    --------------+--------------------------+-----------+----------+---------+----------+--------------+-------------
     book_ref     | character(6)             |           | not null |         | extended |              | 
     book_month   | character(6)             |           | not null |         | extended |              | 
     book_date    | timestamp with time zone |           | not null |         | plain    |              | 
     total_amount | numeric(10,2)            |           |          |         | main     |              | 
    Partition key: LIST (book_month)
    Indexes:
        "bookings_list_pkey" PRIMARY KEY, btree (book_ref, book_month)
    Partitions: bookings_list_201706 FOR VALUES IN ('201706'),
                bookings_list_201707 FOR VALUES IN ('201707'),
                bookings_list_201708 FOR VALUES IN ('201708')

    Filling with layout in sections:

    => INSERT INTO bookings_list(book_ref,book_month,book_date,total_amount) 
       SELECT book_ref,to_char(book_date, 'YYYYMM'),book_date,total_amount
       FROM bookings;
    
    INSERT 0 262788
    

    As a retreat. To automatically fill book_month, it is tempting to use the new functionality of version 12 - GENERATED ALWAYS columns. But, unfortunately, they cannot be used as a partition key. Therefore, the task of filling the month should be solved in other ways.

    Integrity constraints such as CHECK and NOT NULL can be created on a partitioned table. As with inheritance, the INHERIT / NOINHERIT statement indicates whether the restriction should be inherited on all partition tables. Default INHERIT:

    => ALTER TABLE bookings_range ALTER COLUMN book_date SET NOT NULL;
    

    => \d bookings_range
    
                   Partitioned table "bookings.bookings_range"
        Column    |           Type           | Collation | Nullable | Default 
    --------------+--------------------------+-----------+----------+---------
     book_ref     | character(6)             |           |          | 
     book_date    | timestamp with time zone |           | not null | 
     total_amount | numeric(10,2)            |           |          | 
    Partition key: RANGE (book_date)
    Indexes:
        "book_date_idx" btree (book_date)
    Number of partitions: 5 (Use \d+ to list them.)
    

    => \d bookings_range_201706
    
                      Table "bookings.bookings_range_201706"
        Column    |           Type           | Collation | Nullable | Default 
    --------------+--------------------------+-----------+----------+---------
     book_ref     | character(6)             |           |          | 
     book_date    | timestamp with time zone |           | not null | 
     total_amount | numeric(10,2)            |           |          | 
    Partition of: bookings_range FOR VALUES FROM ('2017-06-01 00:00:00+03') TO ('2017-07-01 00:00:00+03')
    Indexes:
        "book_date_201706_idx" btree (book_date)

    An EXCLUDE constraint can only be created locally on partitions.

    A search on book_ref will look in all sections, but by index, thanks to the fact that book_ref is listed first:

    => EXPLAIN (COSTS OFF)
       SELECT * FROM bookings_list WHERE book_ref = '00000F';
    
                                    QUERY PLAN                                
    --------------------------------------------------------------------------
     Append
       ->  Index Scan using bookings_list_201706_pkey on bookings_list_201706
             Index Cond: (book_ref = '00000F'::bpchar)
       ->  Index Scan using bookings_list_201707_pkey on bookings_list_201707
             Index Cond: (book_ref = '00000F'::bpchar)
       ->  Index Scan using bookings_list_201708_pkey on bookings_list_201708
             Index Cond: (book_ref = '00000F'::bpchar)
    (7 rows)

    A search on book_ref and a range of sections should look only in the specified range:

    => EXPLAIN (COSTS OFF)
       SELECT * FROM bookings_list WHERE book_ref = '00000F' AND book_month = '201707';
    
                                        QUERY PLAN                                     
    -----------------------------------------------------------------------------------
     Index Scan using bookings_list_201707_pkey on bookings_list_201707
       Index Cond: ((book_ref = '00000F'::bpchar) AND (book_month = '201707'::bpchar))
    (2 rows)

    The INSERT ... ON CONFLICT command correctly finds the desired section and performs the update:

    => INSERT INTO bookings_list VALUES ('XX0001','201708','2017-08-01',0)
       RETURNING tableoid::regclass, *;
    
           tableoid       | book_ref | book_month |       book_date        | total_amount 
    ----------------------+----------+------------+------------------------+--------------
     bookings_list_201708 | XX0001   | 201708     | 2017-08-01 00:00:00+03 |         0.00
    (1 row)
    INSERT 0 1
    

    => INSERT INTO bookings_list VALUES ('XX0001','201708','2017-08-01',100)
       ON CONFLICT(book_ref,book_month) DO UPDATE SET total_amount = 100
       RETURNING tableoid::regclass, *;
    
           tableoid       | book_ref | book_month |       book_date        | total_amount 
    ----------------------+----------+------------+------------------------+--------------
     bookings_list_201708 | XX0001   | 201708     | 2017-08-01 00:00:00+03 |       100.00
    (1 row)
    INSERT 0 1

    Foreign keys


    In the demo database, the tickets table refers to bookings.

    To make the foreign key possible, add the column book_month, and at the same time break it into sections by month, as well as bookings_list.

    => CREATE TABLE tickets_list (
           ticket_no character(13),
           book_month character(6),
           book_ref  character(6) NOT NULL,
           passenger_id varchar(20) NOT NULL,
           passenger_name text NOT NULL,
           contact_data jsonb,
           PRIMARY KEY (ticket_no, book_month),
           FOREIGN KEY (book_ref, book_month) REFERENCES bookings_list (book_ref, book_month)
       ) PARTITION BY LIST (book_month);
    

    The FOREIGN KEY constraint is worth a closer look. On the one hand, this is the foreign key from the partitioned table (tickets_list), and on the other hand, this is the key to the partitioned table (bookings_list). Thus, foreign keys for partitioned tables are supported in both directions.

    Create sections:

    => WITH dates AS (
           SELECT date_trunc('month',min(book_date)) min_date,
                  date_trunc('month',max(book_date)) max_date 
             FROM bookings
       ), partition AS (
           SELECT to_char(g.month, 'YYYYMM') AS book_month 
             FROM dates,
                  generate_series(dates.min_date, dates.max_date, '1 month'::interval) AS g(month)
       ) 
       SELECT format('CREATE TABLE %I PARTITION OF tickets_list FOR VALUES IN (%L)',
                      'tickets_list_' || partition.book_month, partition.book_month) 
         FROM partition\gexec
    
    CREATE TABLE
    CREATE TABLE
    CREATE TABLE
    

    => \d+ tickets_list
    
                                        Partitioned table "bookings.tickets_list"
         Column     |         Type          | Collation | Nullable | Default | Storage  | Stats target | Description 
    ----------------+-----------------------+-----------+----------+---------+----------+--------------+-------------
     ticket_no      | character(13)         |           | not null |         | extended |              | 
     book_month     | character(6)          |           | not null |         | extended |              | 
     book_ref       | character(6)          |           | not null |         | extended |              | 
     passenger_id   | character varying(20) |           | not null |         | extended |              | 
     passenger_name | text                  |           | not null |         | extended |              | 
     contact_data   | jsonb                 |           |          |         | extended |              | 
    Partition key: LIST (book_month)
    Indexes:
        "tickets_list_pkey" PRIMARY KEY, btree (ticket_no, book_month)
    Foreign-key constraints:
        "tickets_list_book_ref_book_month_fkey" FOREIGN KEY (book_ref, book_month) REFERENCES bookings_list(book_ref, book_month)
    Partitions: tickets_list_201706 FOR VALUES IN ('201706'),
                tickets_list_201707 FOR VALUES IN ('201707'),
                tickets_list_201708 FOR VALUES IN ('201708')

    We fill in:

    => INSERT INTO tickets_list
           (ticket_no,book_month,book_ref,passenger_id,passenger_name,contact_data)
           SELECT t.ticket_no,b.book_month,t.book_ref,
                  t.passenger_id,t.passenger_name,t.contact_data
           FROM bookings_list b JOIN tickets t ON (b.book_ref = t.book_ref);
    
    INSERT 0 366733
    

    => VACUUM ANALYZE tickets_list;
    

    The distribution of lines in sections:

    => SELECT tableoid::regclass, count(*) FROM tickets_list GROUP BY tableoid;
    
          tableoid       | count  
    ---------------------+--------
     tickets_list_201706 |  10160
     tickets_list_201707 | 232755
     tickets_list_201708 | 123818
    (3 rows)

    Connection and aggregation requests


    Join two tables partitioned the same way:

    => EXPLAIN (COSTS OFF) 
       SELECT b.*
         FROM bookings_list b JOIN tickets_list t 
              ON (b.book_ref = t.book_ref and b.book_month = t.book_month);
    
                                     QUERY PLAN                                 
    ----------------------------------------------------------------------------
     Hash Join
       Hash Cond: ((t.book_ref = b.book_ref) AND (t.book_month = b.book_month))
       ->  Append
             ->  Seq Scan on tickets_list_201706 t
             ->  Seq Scan on tickets_list_201707 t_1
             ->  Seq Scan on tickets_list_201708 t_2
       ->  Hash
             ->  Append
                   ->  Seq Scan on bookings_list_201706 b
                   ->  Seq Scan on bookings_list_201707 b_1
                   ->  Seq Scan on bookings_list_201708 b_2
    (11 rows)

    Before starting a connection, each table first combines the sections that fall into the query condition.

    But one could first combine the corresponding monthly sections of both tables, and then combine the result. This can be achieved by enabling the enable_partitionwise_join parameter:

    => SET enable_partitionwise_join = ON;
    => EXPLAIN (COSTS OFF) 
       SELECT b.*
         FROM bookings_list b JOIN tickets_list t
              ON (b.book_ref = t.book_ref and b.book_month = t.book_month);
    
                                            QUERY PLAN                                        
    ------------------------------------------------------------------------------------------
     Append
       ->  Hash Join
             Hash Cond: ((t.book_ref = b.book_ref) AND (t.book_month = b.book_month))
             ->  Seq Scan on tickets_list_201706 t
             ->  Hash
                   ->  Seq Scan on bookings_list_201706 b
       ->  Hash Join
             Hash Cond: ((t_1.book_ref = b_1.book_ref) AND (t_1.book_month = b_1.book_month))
             ->  Seq Scan on tickets_list_201707 t_1
             ->  Hash
                   ->  Seq Scan on bookings_list_201707 b_1
       ->  Hash Join
             Hash Cond: ((t_2.book_ref = b_2.book_ref) AND (t_2.book_month = b_2.book_month))
             ->  Seq Scan on tickets_list_201708 t_2
             ->  Hash
                   ->  Seq Scan on bookings_list_201708 b_2
    (16 rows)

    Now, first, the corresponding sections of the two tables are joined, and then the results of the joins are combined.

    A similar situation with aggregation:

    => EXPLAIN (COSTS OFF) SELECT count(*) FROM bookings_list;
    
                                QUERY PLAN                             
    -------------------------------------------------------------------
     Finalize Aggregate
       ->  Gather
             Workers Planned: 2
             ->  Partial Aggregate
                   ->  Parallel Append
                         ->  Parallel Seq Scan on bookings_list_201707
                         ->  Parallel Seq Scan on bookings_list_201708
                         ->  Parallel Seq Scan on bookings_list_201706
    (8 rows)

    Note that section scans can be performed in parallel. But first the sections come together, only then does the aggregation begin. Alternatively, you can perform aggregation in each section, then combine the result:

    => SET enable_partitionwise_aggregate = ON;
    => EXPLAIN (COSTS OFF) SELECT count(*) FROM bookings_list;
    
                                QUERY PLAN                             
    -------------------------------------------------------------------
     Finalize Aggregate
       ->  Gather
             Workers Planned: 2
             ->  Parallel Append
                   ->  Partial Aggregate
                         ->  Parallel Seq Scan on bookings_list_201707
                   ->  Partial Aggregate
                         ->  Parallel Seq Scan on bookings_list_201708
                   ->  Partial Aggregate
                         ->  Parallel Seq Scan on bookings_list_201706
    (10 rows)

    These features are especially important if part of the sections are external tables. By default, both are disabled because the relevant parameters affect the time it takes to build the plan, but may not always be used.

    Hash Partitioning


    The third way to partition a table is hash partitioning.

    Creating a table:

    => CREATE TABLE bookings_hash (
           book_ref     character(6) PRIMARY KEY,
           book_date    timestamptz  NOT NULL,
           total_amount numeric(10,2)
       ) PARTITION BY HASH(book_ref);
    

    In this version of book_ref, as a partitioning key, you can immediately declare it as a primary key.

    Divide into three sections:

    => CREATE TABLE bookings_hash_p0 
           PARTITION OF bookings_hash FOR VALUES WITH (MODULUS 3, REMAINDER 0);
    => CREATE TABLE bookings_hash_p1 
           PARTITION OF bookings_hash FOR VALUES WITH (MODULUS 3, REMAINDER 1);
    => CREATE TABLE bookings_hash_p2 
           PARTITION OF bookings_hash FOR VALUES WITH (MODULUS 3, REMAINDER 2);
    

    Filling with automatic layout in sections:

    => INSERT INTO bookings_hash SELECT * FROM bookings;
    
    INSERT 0 262788

    The distribution of lines in sections occurs evenly:

    => SELECT tableoid::regclass AS partition, count(*) FROM bookings_hash GROUP BY tableoid;
    
        partition     | count 
    ------------------+-------
     bookings_hash_p0 | 87649
     bookings_hash_p1 | 87651
     bookings_hash_p2 | 87488
    (3 rows)

    New command to view partitioned objects:

    => \dP+ 
    
                                          List of partitioned relations
      Schema  |        Name        |  Owner  |       Type        |     Table      | Total size | Description 
    ----------+--------------------+---------+-------------------+----------------+------------+-------------
     bookings | bookings_hash      | student | partitioned table |                | 13 MB      | 
     bookings | bookings_list      | student | partitioned table |                | 15 MB      | 
     bookings | bookings_range     | student | partitioned table |                | 13 MB      | 
     bookings | tickets_list       | student | partitioned table |                | 50 MB      | 
     bookings | book_date_idx      | student | partitioned index | bookings_range | 5872 kB    | 
     bookings | bookings_hash_pkey | student | partitioned index | bookings_hash  | 5800 kB    | 
     bookings | bookings_list_pkey | student | partitioned index | bookings_list  | 8120 kB    | 
     bookings | tickets_list_pkey  | student | partitioned index | tickets_list   | 19 MB      | 
    (8 rows)
    

    => VACUUM ANALYZE bookings_hash;
    

    Subqueries and nested loop joins


    Section exclusion at runtime is possible with nested loop connections.

    Distribution of the first 10 bookings in sections:

    => WITH top10 AS (
         SELECT tableoid::regclass AS partition, * FROM bookings_hash ORDER BY book_ref LIMIT 10
       ) SELECT partition, count(*) FROM top10 GROUP BY 1 ORDER BY 1;
    
        partition     | count 
    ------------------+-------
     bookings_hash_p0 |     3
     bookings_hash_p1 |     3
     bookings_hash_p2 |     4
    (3 rows)

    Let's look at the query execution plan with the join of the bookings_hash table and the previous subquery:

    => EXPLAIN (ANALYZE,COSTS OFF,TIMING OFF) 
         WITH top10 AS (
             SELECT tableoid::regclass AS partition, * FROM bookings ORDER BY book_ref LIMIT 10
       ) SELECT bh.* FROM bookings_hash bh JOIN top10 ON bh.book_ref = top10.book_ref;
    
                                                 QUERY PLAN                                              
    -----------------------------------------------------------------------------------------------------
     Nested Loop (actual rows=10 loops=1)
       ->  Limit (actual rows=10 loops=1)
             ->  Index Only Scan using bookings_pkey on bookings (actual rows=10 loops=1)
                   Heap Fetches: 0
       ->  Append (actual rows=1 loops=10)
             ->  Index Scan using bookings_hash_p0_pkey on bookings_hash_p0 bh (actual rows=1 loops=3)
                   Index Cond: (book_ref = bookings.book_ref)
             ->  Index Scan using bookings_hash_p1_pkey on bookings_hash_p1 bh_1 (actual rows=1 loops=3)
                   Index Cond: (book_ref = bookings.book_ref)
             ->  Index Scan using bookings_hash_p2_pkey on bookings_hash_p2 bh_2 (actual rows=1 loops=4)
                   Index Cond: (book_ref = bookings.book_ref)
     Planning Time: 0.632 ms
     Execution Time: 0.278 ms
    (13 rows)

    The connection is made using the nested loop method. The outer loop according to the general table expression is executed 10 times. But pay attention to the number of calls to section tables (loops). For each value of the book_ref value of the outer loop, only the section is scanned where this value is stored in the bookings_hash table.

    Compare with disabled section exclusion:

    => SET enable_partition_pruning TO OFF;
    => EXPLAIN (ANALYZE,COSTS OFF,TIMING OFF) 
         WITH top10 AS (
             SELECT tableoid::regclass AS partition, * FROM bookings ORDER BY book_ref LIMIT 10
       ) SELECT bh.* FROM bookings_hash bh JOIN top10 ON bh.book_ref = top10.book_ref;
    
                                                  QUERY PLAN                                              
    ------------------------------------------------------------------------------------------------------
     Nested Loop (actual rows=10 loops=1)
       ->  Limit (actual rows=10 loops=1)
             ->  Index Only Scan using bookings_pkey on bookings (actual rows=10 loops=1)
                   Heap Fetches: 0
       ->  Append (actual rows=1 loops=10)
             ->  Index Scan using bookings_hash_p0_pkey on bookings_hash_p0 bh (actual rows=0 loops=10)
                   Index Cond: (book_ref = bookings.book_ref)
             ->  Index Scan using bookings_hash_p1_pkey on bookings_hash_p1 bh_1 (actual rows=0 loops=10)
                   Index Cond: (book_ref = bookings.book_ref)
             ->  Index Scan using bookings_hash_p2_pkey on bookings_hash_p2 bh_2 (actual rows=0 loops=10)
                   Index Cond: (book_ref = bookings.book_ref)
     Planning Time: 0.886 ms
     Execution Time: 0.771 ms
    (13 rows)
    

    => RESET enable_partition_pruning;
    

    If you reduce the selection to one reservation, then two sections will not be visible at all:

    => EXPLAIN (ANALYZE,COSTS OFF,TIMING OFF) 
         WITH top AS (
             SELECT tableoid::regclass AS partition, * FROM bookings ORDER BY book_ref LIMIT 1
       ) SELECT bh.* FROM bookings_hash bh JOIN top ON bh.book_ref = top.book_ref;
    
                                                QUERY PLAN                                             
    ---------------------------------------------------------------------------------------------------
     Nested Loop (actual rows=1 loops=1)
       ->  Limit (actual rows=1 loops=1)
             ->  Index Only Scan using bookings_pkey on bookings (actual rows=1 loops=1)
                   Heap Fetches: 0
       ->  Append (actual rows=1 loops=1)
             ->  Index Scan using bookings_hash_p0_pkey on bookings_hash_p0 bh (actual rows=1 loops=1)
                   Index Cond: (book_ref = bookings.book_ref)
             ->  Index Scan using bookings_hash_p1_pkey on bookings_hash_p1 bh_1 (never executed)
                   Index Cond: (book_ref = bookings.book_ref)
             ->  Index Scan using bookings_hash_p2_pkey on bookings_hash_p2 bh_2 (never executed)
                   Index Cond: (book_ref = bookings.book_ref)
     Planning Time: 0.250 ms
     Execution Time: 0.090 ms
    (13 rows)

    Instead of a subquery, you can use the function returning a set with the category of variability STABLE:

    => CREATE OR REPLACE FUNCTION get_book_ref(top int) RETURNS SETOF bookings AS $$
           BEGIN
               RETURN QUERY EXECUTE 'SELECT * FROM bookings ORDER BY book_ref LIMIT $1' 
                            USING top;
           END;$$ LANGUAGE plpgsql STABLE;
    

    => EXPLAIN (ANALYZE,COSTS OFF,TIMING OFF) 
           SELECT * FROM bookings_hash bh JOIN get_book_ref(10) f ON bh.book_ref = f.book_ref;
    
                                                 QUERY PLAN                                              
    -----------------------------------------------------------------------------------------------------
     Nested Loop (actual rows=10 loops=1)
       ->  Function Scan on get_book_ref f (actual rows=10 loops=1)
       ->  Append (actual rows=1 loops=10)
             ->  Index Scan using bookings_hash_p0_pkey on bookings_hash_p0 bh (actual rows=1 loops=3)
                   Index Cond: (book_ref = f.book_ref)
             ->  Index Scan using bookings_hash_p1_pkey on bookings_hash_p1 bh_1 (actual rows=1 loops=3)
                   Index Cond: (book_ref = f.book_ref)
             ->  Index Scan using bookings_hash_p2_pkey on bookings_hash_p2 bh_2 (actual rows=1 loops=4)
                   Index Cond: (book_ref = f.book_ref)
     Planning Time: 0.175 ms
     Execution Time: 0.843 ms
    (11 rows)

    Summary


    Summing up, we can say that the built-in or declarative partitioning in PostgreSQL 12 has received a rich set of features and it can be safely recommended to replace partitioning through inheritance.

    Also popular now: