Table bloat? No, have not heard…



    I think many people know the feature of PostgreSQL, which leads to the effect of bloating tables, or table bloat. It is known that it manifests itself in cases of intensive data updating, both with frequent UPDATE and with INSERT / DELETE operations. As a result of such inflation, productivity is reduced. Consider why this happens and how to deal with it.


    Tables in PostgreSQL are presented as 8Kb pages in which records are placed. When one page is completely filled with records, a new page is added to the table. When deleting records using DELETE or changing using UPDATE, the place where the old records were cannot be reused immediately. To do this, the autovacuum cleaning process, or the VACUUM command, runs through the changed pages and marks such a place as free, after which new entries can be safely recorded to this place. If autovacuum fails, for example, as a result of an active change in more data or simply because of poor settings, then new pages will be unnecessarily added to the table as new entries arrive. And even after the cleanup reaches our deleted records, new pages will remain. It turns out that the table becomes more discharged in terms of record density. This is called the table bloat effect, table bloat.

    The cleaning procedure, autovacuum or VACUUM, can reduce the size of the table by removing completely blank pages, but only if they are at the very end of the table. PostgreSQL has VACUUM FULL or CLUSTER to reduce the table as much as possible, but both of these methods involve setting heavy and long locks on the table, which is far from always a suitable solution.

    Consider one of the solutions. When updating a record using UPDATE, if there is free space in the table, the new version will go to the free space, without allocating new pages. Preference is given to free space closer to the beginning of the table. If you update the table using the so-called fake updates, such as some_column = some_column from the last page, at some point, all records from the last page will go to free space in the previous pages of the table. Thus, after several such operations, the last pages will be blank and a regular non-blocking VACUUM will be able to cut them off the table, thereby reducing the size.

    As a result, using this technique, you can compress the table as much as possible, while not causing critical locks, which means without interference for other sessions and the normal operation of the database.
    And now the most important thing)))) To automate this procedure, there is a utility pgcompactor.

    Its main characteristics:
    • does not require any dependencies except Perl> = 5.8.8, i.e. You can simply copy pgcompactor to the server and work with it;
    • works through DBD :: Pg, DBD :: PgPP adapters or even through the standard psql utility if the first two are not on the server;
    • processing of both individual tables and all tables within a schema, database, or the entire cluster;
    • the ability to exclude databases, schemes, or tables from processing;
    • analysis of the bloating effect and processing of only those tables in which it is present; for more accurate calculations, it is recommended to install the pgstattuple extension;
    • analysis and rebuilding of indices with the effect of inflation;
    • analysis and rebuilding of unique constraints and primary keys with the effect of bloating;
    • incremental use i.e. you can stop the compression process without harming anything;
    • dynamic adjustment to the current database load, so as not to affect the performance of user queries (with the ability to adjust at startup);
    • recommendations for administrators, accompanied by a ready DDL, for rebuilding database objects that cannot be rebuilt automatically.


    A couple of examples of use:
    starting on the entire cluster with mandatory rebuilding of indices:
    # pgcompactor --all --reindex

    starting on a separate table (with rebuilding of indices):
    # pgcompactor --reindex --dbname geodata --table cities --verbose info> pgcompactor .log 2> & 1

    Result:
    The table size decreased from 9.2GB to 5.6GB. The total size of all indexes decreased from 7.5GB to 2.8GB.


    Project URL: github.com/PostgreSQL-Consulting/pgcompacttable

    Many thanks to the authors of the utility for their work! This is a really useful tool.

    Thanks for attention!

    Also popular now: