Nailgun December 29, 2017 at 15:04

Django Initialization Stage Optimization

If your Django project runs on synchronous workers and you restart them periodically (for example, in gunicornthis option --max-requests), it would be useful to know that by default after each restart of the worker, the first request to it is processed much longer than the subsequent ones.

In this article, I’ll tell you how I solved this and other problems that cause abnormal delays on random requests.

This article will give examples for a gunicornwsgi server. But they are relevant for any methods of launching a project on synchronous workers. The same is true for uWSGIand mod_wsgi.

Recently moved our Django project to Kubernetes cluster. There are readiness / liveness probes that can pull every running instance of a wsgi server (in our case, gunicorn) for the specified http handle. We have this /api/v1/status:

class StatusView(views.APIView):
    @staticmethod
    def get(request):
        overall_ok = True
        try:
            with django.db.connection.cursor() as cursor:
                cursor.execute('SELECT version()')
                cursor.fetchone()
        except Exception:
            log.exception('Database failure')
            db = 'fail'
            overall_ok = False
        else:
            db = 'ok'
        try:
            cache.set('status', 1)
        except Exception:
            log.exception('Redis failure')
            redis = 'fail'
            overall_ok = False
        else:
            redis = 'ok'
        if overall_ok:
            s = status.HTTP_200_OK
        else:
            s = status.HTTP_500_INTERNAL_SERVER_ERROR
        return Response({
            'web': 'ok',
            'db': db,
            'redis': redis,
        }, status=s)

So, before moving to Kubernetes, we had Zabbix, which every minute made a request /api/v1/statusvia a loadbalancer. And this health check never really fails. But after the move, when checks began to be performed for each individual gunicorn instance and with greater frequency, it suddenly turned out that sometimes we did not fit into the timeout of 5 seconds.

Nevertheless, everything worked fine, users had no problems. Therefore, I did not pay special attention to this, but I set myself the background task all the same to figure out what was the matter. And here is what I managed to find out:

By default, gunicorn starts a master process, which fork the number of processes specified by the argument --workers. Moreover, the wsgi module passed to gunicorn as the main argument is loaded by each worker after the fork. But there is an option --preload. If set, the wsgi module will be loaded BEFORE the fork. Hence the rule:

Always run gunicorn on the prod with the option --preload, which will reduce the initialization time of each worker. As a result, initialization for the most part will occur only in the master process, then already initialized worker processes will fork.

I repeat that most of these optimizations make sense if your Django project runs on synchronous workers and you periodically restart them ( --max-requests).

Nevertheless, it was possible to find out that the use is --preloadnot enough, and the first request to a freshly launched worker will still take longer than the subsequent ones. Trace has shown that preloading wsgi does little, and most of Django is initialized only during the first request. Therefore, a “head-on" solution was born:

In wsgi initialization, press the fake request to the health / status endpoint to immediately initialize the maximum subsystems.

For example, I added the wsgi.pyfollowing:

# make request to /api/v1/status to prepare everything for first user request
def make_init_request():
    from django.conf import settings
    from django.test import RequestFactory
    f = RequestFactory()
    request = f.request(**{
        'wsgi.url_scheme': 'http',
        'HTTP_HOST': settings.SITE_DOMAIN,
        'QUERY_STRING': '',
        'REQUEST_METHOD': 'GET',
        'PATH_INFO': '/api/v1/status',
        'SERVER_PORT': '80',
    })
    def start_response(*args):
        pass
    application(request.environ, start_response)
if os.environ.get('WSGI_FULL_INIT'):
    make_init_request()

As a result, workers began to initialize an order of magnitude faster, because already forked completely ready for the next request.

On traces, problems with initialization stopped ... almost. To my shame, I did not know about this feature. It turns out, by default, Django reconnects to the database with every request. The setup is responsible for this CONN_MAX_AGE, which only (?) For historical reasons makes your Django application work as a php script from zero. So the rule is:

Add the adapter database to the Django settings CONN_MAX_AGE=Noneso that the connections are constant.

I would not even notice this. But, for some reason, the call psycopg2.connectsometimes hangs for exactly 5 seconds. But I didn’t fully understand this. A parallel-running script that calls this function once every 10 seconds worked stably and connected to the database faster than in a second for the entire time it was launched (a couple of weeks).

But these two rules conflict with each other, because Before fork in the main process, connections to the database and cache are created. Child processes inherit open sockets from the main process. As a result, this leads to undefined behavior when several processes work simultaneously on the same socket. Therefore, before fork, you need to close all connections:

# Close connections to database and cache before or after forking.
# Without this, child processes will share these connections and this is not supported.
def close_network_connections():
    from django import db
    from django.core import cache
    from django.conf import settings
    for conn in db.connections:
        db.connections[conn].close()
    django_redis_close_connection = getattr(settings, 'DJANGO_REDIS_CLOSE_CONNECTION', False)
    settings.DJANGO_REDIS_CLOSE_CONNECTION = True
    cache.close_caches()
    settings.DJANGO_REDIS_CLOSE_CONNECTION = django_redis_close_connection
if os.environ.get('WSGI_FULL_INIT'):
    make_init_request()
    # in case wsgi module preloaded in master process (i.e. `gunicorn --preload`)
    if os.environ.get('WSGI_FULL_INIT_CLOSE_CONNECTIONS'):
        close_network_connections()

T.O. when using --preloadand WSGI_FULL_INIT, you still need to ask WSGI_FULL_INIT_CLOSE_CONNECTIONS.

As a result, abnormal delays were completely eliminated. But there are a couple of extreme cases when they can still occur:

If all the workers start to restart at the same time. This is a very likely situation, because if requests between workers are distributed approximately evenly, then it max-requestswill occur at approximately the same time. Therefore:

Run gunicorn c max-requests-jitterso that workers do not restart at the same time, even if they do it fast enough.

Also, a delay can occur during the first request, when connections to the database and other external systems are created.

This can be solved, but I have no idea how to write code independent of the wsgi server used. You gunicorncan add a handler for to post_worker_initand call it there again make_init_request(), then the worker will be 100% ready before receiving the first request. In order not to complicate it, it was decided so far to do without it, because we have already achieved that there are no more delays in practice.

Tags:

Django Initialization Stage Optimization

Also popular now: