
Partitioning model tables in Django with PostgreSQL
Hey.
This is a topic on how to relatively quickly and painlessly set up partitioning (partitioning) of a table by month, if you use Django + PostgreSQL. Much of the above is suitable for other frameworks and ORMs.
You can read about what partitioning is and why it is needed, for example, here , here and here .
So, there is a project on Django and the table of one of the models should be very large. If reading from this table occurs frequently, and the time period in which the record was made is always known, partitioning will speed up the database.
Every time I do not really want to write queries to enable partitioning, so let's try to automate. It’s good if the output is something that a person who is not very familiar with SQL can use. I've read the docs, so you don't have to.
First, I’ll tell you how to quickly try my best practices, and then what’s under the hood. We act like this:
First, install the package from the repository
and connect a couple of applications:
Let there be a model in
add the file
run
If this is the first
Well, the base is now ready, but Django is not there yet. The fact is, starting with version 1.3, Django constructs queries for PostgreSQL
We can roughly force Django to not use it
And we can edit our model so that it is
So, the base is ready, Django is ready, but are we ready? So that reading requests do not lead to a database poll of all sections, we must filter
It is also worth taking care that no
Now everyone is ready. All.
Hood about
The most boring and difficult for me was the creation of indexes for sections. When there is written SQL for creating the table and we move on to the partitioned scheme, then there is no problem - copy-paste. But when Django creates tables, it’s not very clear how to make it not create indexes for the master table, but save the corresponding SQL. In the end, I decided: let Django create the indexes (they will still be empty), and I will copy them into sections.
It remains to decide how to do this.
There
A little copy-paste and we have everything in order to create indexes for table tables.
The package presented here is used by me in a couple of projects and quite successfully copes with the tasks assigned to it.
Thanks to everyone who read this. I hope that something done is useful to someone.
I also ask you to note the shortcomings found in the comments and on the github , suggest improvements and extensions.
This is a topic on how to relatively quickly and painlessly set up partitioning (partitioning) of a table by month, if you use Django + PostgreSQL. Much of the above is suitable for other frameworks and ORMs.
You can read about what partitioning is and why it is needed, for example, here , here and here .
So, there is a project on Django and the table of one of the models should be very large. If reading from this table occurs frequently, and the time period in which the record was made is always known, partitioning will speed up the database.
Every time I do not really want to write queries to enable partitioning, so let's try to automate. It’s good if the output is something that a person who is not very familiar with SQL can use. I've read the docs, so you don't have to.
First, I’ll tell you how to quickly try my best practices, and then what’s under the hood. We act like this:
- Catch
syncdb
so that you can add partitioning commands. - We connect SQL which will pull out the established indexes, create sections, activate indexes on them, add functions and triggers.
- We activate partitioning in end applications and models.
First, install the package from the repository
pip install git+https://github.com/podshumok/django-post-syncdb-hooks
and connect a couple of applications:
INSTALLED APPS = (
# ...
'post_syncdb_hooks',
'post_syncdb_hooks.partitioning',
'your.app',
# ...
)
Let there be a model in
yourapp/models.py
:from django.db import models
class MyModel(models.Model):
date_created = models.DateTimeField()
my_data = models.IntegerField(db_index=True)
add the file
yourapp/sql/post_syncdb-hook.postgresql_psycopg2.sql
(you can edit the intervals to fit the needs):SELECT month_partition_creation(
date_trunc('MONTH', NOW())::date,
date_trunc('MONTH', NOW() + INTERVAL '1 year' )::date,
'yourapp_mymodel', 'date_created');
run
syncdb
:./manage.py syncdb
... and sections are created. If this is the first
syncdb
, then there will be no indexes in the created sections. To fix this, you need to run syncdb
again. Well, the base is now ready, but Django is not there yet. The fact is, starting with version 1.3, Django constructs queries for PostgreSQL
INSERT INTO
, adding them RETURNING...
to get an id
inserted record. And the partitioning method that we use does not support this feature. We can roughly force Django to not use it
RETURNING
anywhere:from django.db import connections, DEFAULT_DB_ALIAS
connections[DEFAULT_DB_ALIAS].features.can_return_id_from_insert = False
And we can edit our model so that it is
RETURNING
not used only with it:from post_syncdb_hooks.partitioning import to_partition
class MyModel(models.Model):
"..."
#...
@to_partition
def save(self, *args, **kwargs):
return super(MyModel, self).save(*args, **kwargs)
So, the base is ready, Django is ready, but are we ready? So that reading requests do not lead to a database poll of all sections, we must filter
QuerySet
's by the field for which partitioning is implemented (in the example - date_created
):qs = MyModel.objects.filter(date_created__lte=..., date_created__gt=...)
It is also worth taking care that no
count()
entries occur anywhere, without special need : for example, paginator
in the admin panel he likes to do this. Now everyone is ready. All.
Hood about post_syncdb_hooks
post_syncdb_hooks
contains management.py
in which the receiver is connected for the signal post_syncdb
. This receiver or hook is called for all installed applications. He looks to see if there is a models.py
folder next to the file of this or that application sql
, and if it is, there are no files post_syncdb-hook.sql
or post_syncdb-hook.(backend).sql
which can be launched. post_syncdb_hooks.partitioning
just contains such an SQL file. In it, we create an SQL function that takes four parameters: start date, end date, table name and field name. When a function is called, sections for the specified table are created for the selected field, starting from the start date to the end date in increments of a month, as well as a trigger for INSERT INTO
and indexes of section tables (if indexes are specified for the main table).Hood about indexes
The most boring and difficult for me was the creation of indexes for sections. When there is written SQL for creating the table and we move on to the partitioned scheme, then there is no problem - copy-paste. But when Django creates tables, it’s not very clear how to make it not create indexes for the master table, but save the corresponding SQL. In the end, I decided: let Django create the indexes (they will still be empty), and I will copy them into sections.
It remains to decide how to do this.
There
psql
is an option -E
that forces it to output SQL generated by internal commands. So that,sudo -u postgres psql -E db
db=# \di
********* QUERY **********
SELECT n.nspname as "Schema",
c.relname as "Name",
CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 's' THEN 'special' WHEN 'f' THEN 'foreign table' END as "Type",
pg_catalog.pg_get_userbyid(c.relowner) as "Owner",
c2.relname as "Table"
FROM pg_catalog.pg_class c
LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
LEFT JOIN pg_catalog.pg_index i ON i.indexrelid = c.oid
LEFT JOIN pg_catalog.pg_class c2 ON i.indrelid = c2.oid
WHERE c.relkind IN ('i','')
AND n.nspname <> 'pg_catalog'
AND n.nspname <> 'information_schema'
AND n.nspname !~ '^pg_toast'
AND pg_catalog.pg_table_is_visible(c.oid)
ORDER BY 1,2;
**************************
db=#
A little copy-paste and we have everything in order to create indexes for table tables.
Thanks!
The package presented here is used by me in a couple of projects and quite successfully copes with the tasks assigned to it.
Thanks to everyone who read this. I hope that something done is useful to someone.
I also ask you to note the shortcomings found in the comments and on the github , suggest improvements and extensions.