Extra join in SQL queries

While debugging the performance of a small project, but with a fairly large base, I ran into an unpleasant special effect.
Django, when fetching with conditions for foreign keys associated with checking for NULL, generates queries containing JOINs for each such key. For example, for a model

class ForumPlugin(models.Model):
    name = models.CharField(
        null=False, 
        blank=False, 
        max_length=50, 
        unique=True, 
        verbose_name=_('name')
    )
class Thread(MPTTModel):   
    parent = TreeForeignKey(
        'self', 
        null=True, 
        blank=True, 
        related_name='children', 
        verbose_name=_('parent thread')
    )
    plugin = models.ForeignKey(
        ForumPlugin, 
        null=True, 
        blank=True, 
        related_name='threads', 
        verbose_name=_('plugin')
    )

When fetching

Thread.objects.filter(plugin__isnull=True, parent__isnull=True)

Djando generates this request:

SELECT `forum_thread`.`id`, `forum_thread`.`parent_id`, `forum_thread`.`plugin_id`, `forum_thread`.`lft`, `forum_thread`.`rght`, `forum_thread`.`tree_id`, `forum_thread`.`level` FROM `forum_thread` LEFT OUTER JOIN `forum_thread` T2 ON (`forum_thread`.`parent_id` = T2.`id`) LEFT OUTER JOIN `forum_forumplugin` ON (`forum_thread`.`plugin_id` = `forum_forumplugin`.`id`) WHERE (T2.`id` IS NULL AND `forum_forumplugin`.`id` IS NULL AND ) ORDER BY `forum_thread`.`id

Naturally, the execution time of such a request increases by several orders of magnitude, which can be critical with large tables. So on my project, on a table of the order of 20-30k records, such a sample is performed from 100ms to 300ms instead of 1ms, which increases the page generation time by half.

Unfortunately, the ORM developers have known the bug for four years and has a long and sad history .

Currently present in all stable versions, including 1.4.3. It is assumed that in 1.5 it will finally be fixed.

As a workaround, it is advised to use double negation:

Thread.objects.exclude(plugin__isnull=False, parent__isnull=False)

but I did not succeed in practice in this way to get rid of the problem. Going directly to the parent_id field also does not help.

Be careful when designing models and try to take into account this feature of Django, and avoid fetching foreign keys using NULL conditions.

UPD: Found a solution:

Category.objects.extra(where=['parent_id IS NULL'])

Using raw sql is certainly bad form, but apparently this is the only solution.

Also popular now: