Partitioning model tables in Django with PostgreSQL

    Hey.
    This is a topic on how to relatively quickly and painlessly set up partitioning (partitioning) of a table by month, if you use Django + PostgreSQL. Much of the above is suitable for other frameworks and ORMs.

    You can read about what partitioning is and why it is needed, for example, here , here and here .

    So, there is a project on Django and the table of one of the models should be very large. If reading from this table occurs frequently, and the time period in which the record was made is always known, partitioning will speed up the database.

    Every time I do not really want to write queries to enable partitioning, so let's try to automate. It’s good if the output is something that a person who is not very familiar with SQL can use. I've read the docs, so you don't have to.

    First, I’ll tell you how to quickly try my best practices, and then what’s under the hood. We act like this:
    1. Catch syncdbso that you can add partitioning commands.
    2. We connect SQL which will pull out the established indexes, create sections, activate indexes on them, add functions and triggers.
    3. We activate partitioning in end applications and models.

    First, install the package from the repository
    pip install git+https://github.com/podshumok/django-post-syncdb-hooks
    

    and connect a couple of applications:
    INSTALLED APPS = (
        # ...
        'post_syncdb_hooks',
        'post_syncdb_hooks.partitioning',
        'your.app',
        # ...
    )
    

    Let there be a model in yourapp/models.py:
    from django.db import models
    class MyModel(models.Model):
        date_created = models.DateTimeField()
        my_data = models.IntegerField(db_index=True)
    

    add the file yourapp/sql/post_syncdb-hook.postgresql_psycopg2.sql(you can edit the intervals to fit the needs):
    SELECT month_partition_creation(
        date_trunc('MONTH', NOW())::date,
        date_trunc('MONTH', NOW() + INTERVAL '1 year' )::date,
        'yourapp_mymodel', 'date_created');
    

    run syncdb:
    ./manage.py syncdb
    
    ... and sections are created.

    If this is the first syncdb, then there will be no indexes in the created sections. To fix this, you need to run syncdbagain.

    Well, the base is now ready, but Django is not there yet. The fact is, starting with version 1.3, Django constructs queries for PostgreSQL INSERT INTO, adding them RETURNING...to get an idinserted record. And the partitioning method that we use does not support this feature.

    We can roughly force Django to not use it RETURNINGanywhere:
    from django.db import connections, DEFAULT_DB_ALIAS
    connections[DEFAULT_DB_ALIAS].features.can_return_id_from_insert = False
    

    And we can edit our model so that it is RETURNINGnot used only with it:
    from post_syncdb_hooks.partitioning import to_partition
    class MyModel(models.Model):
        "..."
        #...
        @to_partition
        def save(self, *args, **kwargs):
            return super(MyModel, self).save(*args, **kwargs)
    

    So, the base is ready, Django is ready, but are we ready? So that reading requests do not lead to a database poll of all sections, we must filter QuerySet's by the field for which partitioning is implemented (in the example - date_created):
    qs = MyModel.objects.filter(date_created__lte=..., date_created__gt=...)
    

    It is also worth taking care that no count()entries occur anywhere, without special need : for example, paginatorin the admin panel he likes to do this.

    Now everyone is ready. All.

    Hood about post_syncdb_hooks

    post_syncdb_hookscontains management.pyin which the receiver is connected for the signal post_syncdb. This receiver or hook is called for all installed applications. He looks to see if there is a models.pyfolder next to the file of this or that application sql, and if it is, there are no files post_syncdb-hook.sqlor post_syncdb-hook.(backend).sqlwhich can be launched.

    post_syncdb_hooks.partitioningjust contains such an SQL file. In it, we create an SQL function that takes four parameters: start date, end date, table name and field name. When a function is called, sections for the specified table are created for the selected field, starting from the start date to the end date in increments of a month, as well as a trigger for INSERT INTOand indexes of section tables (if indexes are specified for the main table).

    Hood about indexes

    The most boring and difficult for me was the creation of indexes for sections. When there is written SQL for creating the table and we move on to the partitioned scheme, then there is no problem - copy-paste. But when Django creates tables, it’s not very clear how to make it not create indexes for the master table, but save the corresponding SQL. In the end, I decided: let Django create the indexes (they will still be empty), and I will copy them into sections.

    It remains to decide how to do this.

    There psqlis an option -Ethat forces it to output SQL generated by internal commands. So that,
    sudo -u postgres psql -E db
    
    db=# \di
    ********* QUERY **********
    SELECT n.nspname as "Schema",
      c.relname as "Name",
      CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 's' THEN 'special' WHEN 'f' THEN 'foreign table' END as "Type",
      pg_catalog.pg_get_userbyid(c.relowner) as "Owner",
     c2.relname as "Table"
    FROM pg_catalog.pg_class c
         LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
         LEFT JOIN pg_catalog.pg_index i ON i.indexrelid = c.oid
         LEFT JOIN pg_catalog.pg_class c2 ON i.indrelid = c2.oid
    WHERE c.relkind IN ('i','')
          AND n.nspname <> 'pg_catalog'
          AND n.nspname <> 'information_schema'
          AND n.nspname !~ '^pg_toast'
      AND pg_catalog.pg_table_is_visible(c.oid)
    ORDER BY 1,2;
    **************************
    db=#
    

    A little copy-paste and we have everything in order to create indexes for table tables.

    Thanks!

    The package presented here is used by me in a couple of projects and quite successfully copes with the tasks assigned to it.
    Thanks to everyone who read this. I hope that something done is useful to someone.
    I also ask you to note the shortcomings found in the comments and on the github , suggest improvements and extensions.

    Also popular now: