Magikan July 21, 2014 at 15:12

Django-orm in search of LEFT JOIN

From the sandbox

It has long ceased to be a secret that Django-ORM is generally stupid as a stick and unable to solve more or less serious tasks, and especially stupid in cases where it is necessary to influence the formation of reasonable SQL queries from the outside. About one of these cases and how I tried to deal with it - I’ll tell you under the cut.

It all started with the fact that picking the TecDoc database inspired me to implement my own translation storage system in the database. Without hesitation, I threw such models for the application of translations and one for bullying:

class Translations(models.Model):
    " переводы "
    text = models.TextField(null=True, blank=True)
    lng = models.SlugField(max_length=32, choices=settings.LANGUAGES, db_index=True)
    des = models.ForeignKey("Designations", db_index=True, related_name='translations')
    class Meta:
        verbose_name = _("translation")
        verbose_name_plural = _("translations")
        ordering = ['lng']
        # db_table='mlang_translations'
class Designations(models.Model):
    " описание (метка) перевода содержит только поле id"
    class Meta:
        verbose_name = _("designation")
        verbose_name_plural = _("designations")
        # db_table='mlang_designations'
class Page(MPTTModel):
    content = models.ForeignKey('mlang.Designations', null=True, blank=True, related_name="+")
    keywords = models.ForeignKey('mlang.Designations', null=True, blank=True, related_name="+")
    description = models.ForeignKey('mlang.Designations', null=True, blank=True, related_name="+")
    title = models.ForeignKey('mlang.Designations', null=True, blank=True, related_name="+")
    code = models.CharField(max_length=256, db_index=True)
    parent = TreeForeignKey('self', null=True, blank=True)
    # db_table='flatpages_page'

It works as follows:

Many models refer to translation tags, after which you can get a translation for a field in one of the languages. The number of requests in the simplest case will be

1 + количество полей, которые надо перевести * количество объектов в выборке

.
As can be seen from the description of the models, the translated fields and the translations themselves refer to the same label, which makes it easy to force the label itself when selecting translations due to the direct JOIN of the translation to the desired field. And here dances with a tambourine begin.

Let's start with the “forehead” option, which is better not to use if there are no other options: QuerySet.raw and get the following code:

Page.objects.raw("""
select
        fpage.id id,
        content_translated.text content_translated,
        title_translated.text title_translated,
        keywords_translated.text keywords_translated,
        description_translated.text description_translated
from flatpages_page fpage
        left join mlang_translations content_translated on fpage.content_id=content_translated.des_id and content_translated.lng=%s
        left join mlang_translations description_translated on fpage.description_id=description_translated.des_id  and description_translated.lng=%s
        left join mlang_translations keywords_translated on fpage.keywords_id=keywords_translated.des_id and keywords_translated.lng=%s
        left join mlang_translations title_translated on fpage.title_id=title_translated.des_id and title_translated.lng=%s
""", params=["ru", "ru", "ru", "ru"])

I think there is no need to paint the pros and cons of this approach.
Naturally, if there are many models and / or you need to receive translations in several views, and / or the fields change at some point - this will be a nightmare in reality.

we start actively google on the topic in django orm left joinorder to get an equivalent SQL query, but python / django way.

the first thing that caught my eye was to fake a Q object so that it turns into a LEFT JOIN: QLeftOuterJoin , and if you google a little longer and carefully you can notice that this solution is ancient as mammoths and since about 2010 it has not worked. Attempts to launch were unsuccessful.

Then in the issuance of Google we meet a certain " hack"out of the box over QuerySet.query, which by standard means allows you to embed a custom INNER / LEFT JOIN into a QuerySet and for our experimental selection the code will look like this:

        qs = Page.objects.filter(id__isnull=False) # костыль, иначе дальше не работает.
        for field in Page._meta.local_fields:
            if field.rel is not None and field.rel.to is Designations:
                join = qs.query.join(
                    (Page._meta.db_table, Translations._meta.db_table, ((field.name+'_id', 'des_id'),)),
                    nullable=True,  # это LEFT JOIN
                    join_field=Translations._meta.get_field_by_name('des')[0]
                )
                qs = qs.extra(
                    select={
                        field.name+"_translated": join+'.text'
                    },
                    where=[join+".lng=%s"],
                    params=['ru']
                )

I’ll tell you what is happening here: we sort through all the fields of the Page model and for each ForeignKey (Designations) we generate a unique JOIN. The docstring query.join says:

'join_cols' is a tuple of tuples containing columns to join on ((l_id1, r_id1), (l_id2, r_id2))

Those. With the 3rd element of the first agrument we can pass a lot of connecting fields for the condition, BUT we cannot do the filtering within the JOIN. As a result, where and param appeared in the qs.extra call, which, in turn, all our LEFT JOINs were broken into a regular INNER JOIN of the form:

SELECT ... FROM
        flatpages_pages, mlang_translations t1, mlang_translations_t2, .....
where
      t1.lng='ru' AND t2.lng='ru' AND ......

On the one hand, we can say - this is a feature, if one field is not translated, then hide the entire record and give 404yu. On the other hand, this is not at all the behavior that I want by default.

Well, okay, we go further with the normal django way described in the documentation: QuerySet.extra and write such an auxiliary function for automatically generating translations to the desired model:

def translate(model, lng, exclude=None):
    if exclude is not None and not isinstance(exclude, (list, tuple, set, frozenset,)):
        raise TypeError('exclude must be iterable')
    fields = []
    for field in model._meta.fields:
        if field.rel is not None and field.rel.to is Designations:
            if exclude is not None and field.name in exclude:
                continue
            fields.append(
                [field.name, map(lambda x: x[1], field.rel.get_joining_columns())[0]]
            )
    if not fields:
        return {}
    return dict(
        tables=[
            '"{trans._meta.db_table}" AS "trans_{pos}"'.format(trans=Translations, pos=pos)
            for pos, val in enumerate(fields)
        ],
        select={
            column[0] + "_translated": "trans_{0}.text".format(pos)
            for pos, column in enumerate(fields)
        },
        where=[
            "{model._meta.db_table}.{column[1]}=trans_{pos}.des_id and trans_{pos}.lng=%s".format(pos=pos,
                                                                                                  column=column,
                                                                                                  model=model)
            for pos, column in enumerate(fields)
        ],
        params=[lng] * len(fields)
    )

It works quite simply: iterates over all ForeignKey (Designations) from the transferred model and fills in the dictionary for transmission to QuerySet.extra, in the end we get the following call:

Page.objects.extra(**translate(Page, lng))

It looks beautiful, BUT these are the same eggs only in profile as in paragraph 2, only purebred INNER JOIN in the query text ...

upd: As promised, I will supplement the article with new and final results of the search for answers.

All of the above methods relied mainly on the documentation and some “hacks” without much going into the details of how it works in general.
So, if you look at the sources of the Query class in django.db.models.sql.query in particular, the join function itself can be seen not frail such work with the alias_map and join_map dictionaries. join_map is nothing more than a regular dictionary where the tuple is the key, which we pass as the 1st argument when we call join, and the value is a tuple of alias for accurate identification of the join in the request. alias_map uses the same alias as keys, and the handle of JOIN itself as the value, which will later be converted to SQL. The type and format of the descriptor is reduced to the form:

JoinInfo = namedtuple('JoinInfo',
                      'table_name rhs_alias join_type lhs_alias '
                      'join_cols nullable join_field')

The conversion to SQL itself is hard-wired, which completely eliminates the possibility of having some monkey patching in the bowels of django without adding the ability to generate JOINs smarter than

LEFT OUTER JOIN table alias ON main_table.field=alias.field

BUT, there is one significant but:
In pursuit of an elegant solution, I managed to overlook the fact that our left join from item 2 as a whole is fully working and meets all the requirements, but is slightly unsatisfied. As written in all SQL tutorials: LEFT JOIN substitutes NULL for all missing right samples, and therefore we can expand the WHERE clause so that n2 turns into an adequate JEFT JOIN equivalent to what is described at the very beginning of the article. And the code from n2 will look like this:

        qs = Page.objects.filter(id__isnull=False)  # иначе не запустится
        for field in Page._meta.local_fields:
            if field.rel is not None and field.rel.to is Designations:
                alias = qs.query.join(
                    (Page._meta.db_table, Translations._meta.db_table, ((field.name+"_id", 'des_id',),)),
                    nullable=True,
                    join_field=Translations._meta.get_field_by_name("des")[0]
                )
                qs = qs.extra(
                    select={field.name+"_translated": alias+'.text'},
                    where=["{0}.lng='{1}' or {0}.lng is null".format(alias, 'ru')],  #  добавил or lng is null
                )

at the output we get the data absolutely identical to SQL from Query.raw, it just made WHERE more complicated by moving the filtering here by language, which reduced the readability of the final SQL and I can’t say for sure how this affects the SQL execution speed at the database level relative to the original one. In general, we can say that the problem was solved, I was able to get rid of a huge number of unnecessary queries to the database by building more intelligent queries to the database without writing pure SQL, which in turn guarantees portability of the code between different DBMSs.

PS: I hope someone this little study will help sleep more soundly.
Sources can be found on github.com .

Tags:

Django-orm in search of LEFT JOIN

Also popular now: