menstenebris May 30, 2019 at 22:21

Customization of Django ORM on the example of ZomboDB

Tutorial

Often when working with Django and PostgreSQL, there is a need for additional extensions for the database. And if for example with hstore or PostGIS (thanks to GeoDjango) everything is quite convenient, then with rarer extensions - like pgRouting, ZomboDB, etc. - you have to either write in RawSQL or customize Django ORM. What I propose in this article is to do it using ZomboDB as an example and its getting started tutorial . And at the same time, let's consider how you can connect ZomboDB to a Django project.

PostgreSQL has its own full-text search and it works, judging by the latest benchmarks, pretty quickly. But its capabilities in search still leave much to be desired. As a result, without Lucene-based solutions - ElasticSearch, for example - is tight. ElasticSearch inside has its own database, which conducts a search. The main solution at the moment is manual control of data consistency between PostgreSQL and ElasticSearch using signals or manual callback functions.

ZomboDB is an extension that implements its own type of index, turning the table value into a pointer to ElasticSearch , which allows full-text table searches using ElasticSearch DSL as part of SQL syntax.

At the time of writing, a network search did not produce any results. Of articles on Habré about ZomboDB only one . There are no articles on integrating ZomboDB and Django.

The ZomboDB description says that calls to Elasticsearch go through the RESTful API, so performance is in doubt, but now we will not touch on it. Also issues of the correct removal of ZomboDB without data loss.

Next, we will conduct all tests in Docker , so we will collect a small docker-compose file

docker-compose.yaml

version: '3'
services:
  postgres:
    build: docker/postgres
    environment:
      - POSTGRES_USER=django
      - POSTGRES_PASSWORD=123456
      - POSTGRES_DB=zombodb
      - PGDATA=/home/postgresql/data
    ports:
      - 5432:5432
  # sudo sysctl -w vm.max_map_count=262144
  elasticsearch:
    image: elasticsearch:6.5.4
    environment:
      - cluster.name=zombodb
      - bootstrap.memory_lock=true
      - ES_JAVA_OPTS=-Xms512m -Xmx512m
    ulimits:
      memlock:
        soft: -1
        hard: -1
    ports:
      - 9200:9200
  django:
    build: docker/python
    command: python3 manage.py runserver 0.0.0.0:8000
    volumes:
      - ./:/home/
    ports:
      - 8000:8000
    depends_on:
      - postgres
      - elasticsearch

The latest version of ZomboDB works with maximum 10th version of Postgres and requires curl from dependencies (I suppose to make queries in ElasticSearch).

FROM postgres:10
WORKDIR /home/
RUN apt-get -y update && apt-get -y install curl
ADD https://www.zombodb.com/releases/v10-1.0.3/zombodb_jessie_pg10-10-1.0.3_amd64.deb ./
RUN dpkg -i zombodb_jessie_pg10-10-1.0.3_amd64.deb
RUN rm zombodb_jessie_pg10-10-1.0.3_amd64.deb
RUN apt-get -y clean

The container for Django is typical. In it we will put only the latest versions of Django and psycopg2.

FROM python:stretch
WORKDIR /home/
RUN pip3 install --no-cache-dir django psycopg2-binary

ElasticSearch on Linux does not start with the basic settings of vm.max_map_count, so we will have to increase them a bit (who knows how to automate this through docker - write in the comments).

sudo sysctl -w vm.max_map_count=262144

So, the test environment is ready. You can go to the project on Django. I won’t give it as a whole; those who wish can see it in the repository on GitLab . I will dwell only on critical points.

The first thing we need to do is plug in ZomboDB as an extension in PostgreSQL. You can, of course, connect to the database and enable the extension through SQL CREATE EXTENSION zombodb;. You can even use docker-entrypoint-initdb.d hook in the official Postgres container for this. But since we have Django, then we will go his way.
After creating the project and creating the first migration, add an extension connection to it.

from django.db import migrations, models
from django.contrib.postgres.operations import CreateExtension
class Migration(migrations.Migration):
    initial = True
    dependencies = [
    ]
    operations = [
        CreateExtension('zombodb'),
    ]

Secondly, we need a model that will describe the test pattern. To do this, we need a field that works with the zdb.fulltext data type. Well, let's write your own . Since this data type for django behaves the same as the native postgresql text, when we create our field, we will inherit our class from models.TextField. In addition, two important things need to be done: turn off the ability to use the Btree index on this field and restrict the backend for the database. The end result is as follows:

class ZomboField(models.TextField):
    description = "Alias for Zombodb field"
    def __init__(self, *args, **kwargs):
        kwargs['db_index'] = False
        super().__init__(*args, **kwargs)
    def db_type(self, connection):
        databases = [
            'django.db.backends.postgresql_psycopg2',
            'django.db.backends.postgis'
        ]
        if connection.settings_dict['ENGINE'] in databases:
            return 'zdb.fulltext'
        else:
            raise TypeError('This database not support')

Third, explain to ZomboDB where to look for our ElasticSearch. In the database itself, a custom index from ZomboDB is used for this purpose. Therefore, if the address changes, then the index must be changed.
Django names tables according to the app_model pattern: in our case, the application is called main, and the model is article. elasticsearch is the dns name that docker assigns by container name.
In SQL, it looks like this:

CREATE INDEX idx_main_article
          ON main_article
       USING zombodb ((main_article.*))
        WITH (url='elasticsearch:9200/');

In Django, we also need to create a custom index. Indexes there are not very flexible yet: in particular, the zombodb index does not indicate a specific column, but the entire table . In Django, an index requires a mandatory field reference. So I changed statement.parts['columns']to ((main_article.*)), but the construct and deconstruct methods still require you to specify the fields attribute when creating the field. We also need to pass an additional parameter to params. Why redefine the method __init__, deconstructand get_with_params.
In general, the design turned out to be working. Migrations are applied and canceled without problems.

class ZomboIndex(models.Index):
    def __init__(self, *, url=None, **kwargs):
        self.url = url
        super().__init__(**kwargs)
    def create_sql(self, model, schema_editor, using=''):
        statement = super().create_sql(model, schema_editor, using=' USING zombodb')
        statement.parts['columns'] = '(%s.*)' % model._meta.db_table
        with_params = self.get_with_params()
        if with_params:
            statement.parts['extra'] = " WITH (%s) %s" % (
                ', '.join(with_params),
                statement.parts['extra'],
            )
        print(statement)
        return statement
    def deconstruct(self):
        path, args, kwargs = super().deconstruct()
        if self.url is not None:
            kwargs['url'] = self.url
        return path, args, kwargs
    def get_with_params(self):
        with_params = []
        if self.url:
            with_params.append("url='%s'" % self.url)
        return with_params

Those who do not like this approach can use migrations from RunSQL by directly adding an index. You just have to keep track of the name of the table and index yourself.

migrations.RunSQL(
    sql = (
        "CREATE INDEX idx_main_article "
        "ON main_article "
        "USING zombodb ((main_article.*)) "
        "WITH (url='elasticsearch:9200/');"
    ),
    reverse_sql='DROP INDEX idx_main_article'
)

The result is such a model. ZomboField accepts the same arguments as TextField, with one exception - index_db does not affect anything, just like the fields attribute in ZomboIndex.

class Article(models.Model):
    text = ZomboField()
    class Meta:
        indexes = [
            ZomboIndex(url='elasticsearch:9200/', name='zombo_idx', fields=['text'])
        ]

Ultimately, the migration file should look like this:

from django.db import migrations, models
from django.contrib.postgres.operations import CreateExtension
import main.models
class Migration(migrations.Migration):
    initial = True
    dependencies = [
    ]
    operations = [
        CreateExtension('zombodb'),
        migrations.CreateModel(
            name='Article',
            fields=[
                ('id', models.AutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')),
                ('text', main.models.ZomboField()),
            ],
        ),
        migrations.AddIndex(
            model_name='article',
            index=main.models.ZomboIndex(fields=['text'], name='zombo_idx', url='elasticsearch:9200/'),
        )
    ]

For those interested in SQL enclose that outputs Django ORM (can be seen through the sqlmigratewell, or with the docker: sudo docker-compose exec django python3 manage.py sqlmigrate main 0001)

BEGIN;
--
-- Creates extension zombodb
--
CREATE EXTENSION IF NOT EXISTS "zombodb";
--
-- Create model Article
--
CREATE TABLE "main_article" ("id" serial NOT NULL PRIMARY KEY, "text" zdb.fulltext NOT NULL);
--
-- Create index zombo_idx on field(s) text of model article
--
CREATE INDEX "zombo_idx" ON "main_article" USING zombodb ((main_article.*)) WITH (url='elasticsearch:9200/') ;
COMMIT;

So, we have a model. It remains now to do a search through filter. To do this, describe your lookup and register it.

@ZomboField.register_lookup
class ZomboSearch(models.Lookup):
    lookup_name = 'zombo_search'
    def as_sql(self, compiler, connection):
        lhs, lhs_params = self.process_lhs(compiler, connection)
        rhs, rhs_params = self.process_rhs(compiler, connection)
        params = lhs_params + rhs_params
        return "%s ==> %s" % (lhs.split('.')[0], rhs), params

The search in this case will look like this:

Article.objects.filter(text__zombo_search='(call OR box)')

But usually one search is not enough. It also requires ranking the result and highlighting the words found.
Well, ranking is pretty straightforward. We write our own function :

from django.db.models import FloatField, Func
class ZomboScore(Func):
    lookup_name = 'score'
    function = 'zdb.score'
    template = "%(function)s(ctid)"
    arity = 0
    @property
    def output_field(self):
        return FloatField()

Now you can build quite complex queries without any problems.

scores = (Article.objects
          .filter(text__zombo_search='delete')
          .annotate(score=ZomboScore())
          .values_list(F('score'))
          .order_by('-score'))

Highlighting the result (highlight) turned out to be somewhat more complicated, it didn’t work out beautifully. Django psycopg2 backend in any situations converts имя_колонкиto таблица.имя_колонки. If it was text, then there will be "main_article"."text"something ZomboDB categorically does not accept. The indication of the column should be exclusively the textual name of the column. But here RawSQL comes to the rescue.

from django.db.models.expressions import RawSQL
highlighted = (Article.objects
               .filter(text__zombo_search='delete')
               .values(highlight_text=RawSQL("zdb.highlight(ctid, %s)", ('text',))))

The full version of the project with tests can be viewed in the repository . All examples from the article are written there in the form of tests. I hope for someone this article will be useful and will encourage you not to write a bike on signals, with the ability to shoot yourself all the consistency, and use a ready-made solution without losing all the positive aspects of ORM. Additions and corrections are also welcome.

UPD: The django-zombodb library has appeared

Tags:

Customization of Django ORM on the example of ZomboDB

Also popular now: