Django Asynchronous Jobs with Celery

    Greetings!
    I think most Django developers have heard of Celery , an asynchronous task execution system, and many even use it actively.

    About a year ago, there was a good article on the hub that talked about how to use Celery. However, as was mentioned in the conclusion, Сelery 2.0 has already been released (at the moment the stable version is 2.2.7), where integration with django has been moved to a separate package, as well as other changes .

    This article will be useful primarily for beginners who start working with Django, and they need something that can perform asynchronous and / or periodic tasks in the system (for example, cleaning outdated sessions). I will show how to install and configure Celery to work with Django from beginning to end, as well as some other useful settings and pitfalls.


    First of all, check for the presence of the python-setuptools package in the system , and install it in case of absence:

    aptitude install python-setuptools

    Celery Installation

    Celery itself is very easy to install:

    
    easy_install Celery

    More in the original: http://celeryq.org/docs/getting-started/introduction.html#installation

    In the article, the link to which was given at the beginning, MongoDB was used as a backend , here I will show how to use as a backend and message broker the same database in which the rest of the Django application stores data.

    django-celery

    Install the django-celery package :
    
    easy_install django-celery
    

    As already mentioned, django-celery provides a convenient integration of Celery and Django. In particular, it uses Django ORM as a backend to save Celery jobs, and it automatically finds and logs Celery jobs for Django applications listed in INSTALLED_APPS .

    After installing django-celery, you need to configure:
    • add djcelery to INSTALLED_APPS list :
      INSTALLED_APPS += ("djcelery", )
      

    • add the following lines to the django settings file {{settings.py}}:
      import djcelery
      djcelery.setup_loader()
      

    • Create the necessary tables in the database:
      
      ./manage.py syncdb
      

    • We set the database as the storage location for periodic tasks, add it to settings.py :
      CELERYBEAT_SCHEDULER = "djcelery.schedulers.DatabaseScheduler"
      

      With this option, we will be able to add / remove / edit periodic jobs through the django admin panel.

    When using mod_wsgi, add the following lines to the WSGI configuration file :
    import os
    os.environ["CELERY_LOADER"] = "django"
    


    django-kombu

    Now we just have to find a suitable message broker for Celery, in this article I will use django-kombu - a package that allows you to use the Django database as a message store for Kombu (AMPQ implementation in python).
    Install the package:
    
    easy_install django-kombu
    

    Customize:
    • add djkombu to the INSTALLED_APPS list :
      INSTALLED_APPS += ("djkombu", )
      

    • Set djkombu as a broker in settings.py :
      BROKER_BACKEND = "djkombu.transport.DatabaseTransport"
      

    • We create the necessary tables in the database:
      
      ./manage.py syncdb
      

    We launch

    We start the celery and celerybeat processes:
    (Without celerybeat, you can start and perform regular tasks. To run periodic scheduled tasks, you need to start celerybeat)
    • In linux, both processes can be started simultaneously using the -B switch :
      
      # ./manage.py celeryd -B
      -------------- celery@test v2.2.7
      ---- **** -----
      --- * *** * -- [Configuration]
      -- * - **** --- . broker: djkombu.transport.DatabaseTransport://guest@localhost0/
      - ** ---------- . loader: djcelery.loaders.DjangoLoader
      - ** ---------- . logfile: [stderr]@WARNING
      - ** ---------- . concurrency: 16
      - ** ---------- . events: OFF
      - *** --- * --- . beat: ON
      -- ******* ----
      --- ***** ----- [Queues]
      -------------- . celery: exchange:celery (direct) binding:celery
      

    • On windows, celery and celerybeat must be run separately:
      
      ./manage.py celeryd --settings=settings
      ./manage.py celerybeat
      

      The --settings option may be required if the following exception occurs:
      
      ImportError: Could not import settings 'app_name.settings' (Is it on sys.path?): No module named app_name.settings
      

      Details about the problem: http://groups.google.com/group/celery-users/browse_thread/thread/43a95be6865a636/d91ab2492885f3d4?lnk=gst&q=settings#d91ab2492885f3d4
      A complete list of known problems with celery on Windows: http: // celeryproject. org / docs / faq.html # windows

    After starting, we can see how periodic tasks look in the django admin panel:

    image

    If you use something other than Django ORM (RabbitMQ for example) as the celery backend, then in the Django admin panel you could also view the status of all other tasks, it looks something like this:

    image
    Details: http://stackoverflow.com/questions/5449163/django-celery-admin-interface-showing-zero-tasks-workers

    UPDATE: I ’m adding a little about demonization, as it may not happen the first time.

    Run celery as a service

    Download the celery launch script from here: https://github.com/ask/celery/tree/master/contrib/generic-init.d/ and place it in the /etc/init.d directory with the appropriate rights.
    In the / etc / default directory , create the celeryd file , from which the script will take the startup settings:
    
    # Where the Django project is.
    CELERYD_CHDIR="/var/www/myproject"
    # Path to celeryd
    CELERYD_MULTI="$CELERYD_CHDIR/manage.py celeryd_multi"
    CELERYD_OPTS="--time-limit=300 --concurrency=8 -B"
    CELERYD_LOG_FILE=/var/log/celery/%n.log
    # Path to celerybeat
    CELERYBEAT="$CELERYD_CHDIR/manage.py celerybeat"
    CELERYBEAT_LOG_FILE="/var/log/celery/beat.log"
    CELERYBEAT_PID_FILE="/var/run/celery/beat.pid"
    CELERY_CONFIG_MODULE="settings"
    export DJANGO_SETTINGS_MODULE="settings"
    

    The --concurrency option sets the number of celery processes (by default, this number is equal to the number of processors).
    After that, you can start celery using service :
    
    service celeryd start
    

    More details: docs.celeryproject.org/en/latest/tutorials/daemonizing.html#daemonizing

    Work with celery

    After installing django-celery, celery jobs are automatically registered from all tasks.py modules from all applications listed in INSTALLED_APPS . In addition to tasks modules, you can also specify additional modules using the CELERY_IMPORTS parameter :
    CELERY_IMPORTS=('myapp.my_task_module',)
    

    It is also useful to activate the CELERY_SEND_TASK_ERROR_EMAILS option , due to which Celery will notify of all errors to the addresses listed in the ADMINS variable .

    Writing assignments for celery has not changed much since the previous article:
    from celery.task import periodic_task
    from celery.schedules import crontab
    @periodic_task(ignore_result=True, run_every=crontab(hour=0, minute=0))
    def clean_sessions():
        Session.objects.filter(expire_date__lt=datetime.now()).delete()
    

    The only difference is that decorators should now be imported from celery.task , the decorators module has become deprecated.

    A couple of performance notes:
    • If the task does not return any result, it is better to set the ignore_result = True option
    • Turn off rate limits if your tasks do not use them:
      CELERY_DISABLE_RATE_LIMITS = True
      

    More about these and other Celery tips: http://celeryproject.org/docs/userguide/tasks.html#tips-and-best-practices

    Also popular now: