Synchronous code in asynchronous Twisted, or a tale about how to cross a hedgehog with a snake

    Things are good

    Twisted is an asynchronous (event-oriented) framework written in Python. A powerful tool for the rapid development of network (and not only) services. It is developed using the Reactor design pattern . Services created using Twisted are fast and reliable, the framework allows you not to write pasta code, full of incomprehensible callbacks, has beautiful helpers (Deferred, Transport, Protocol etc). In other words, our backend developer life is doing better.

    But there are problems

    The main problem is that numerous, reliable, tested, convenient libraries that use synchronous Python modules (socket, os, ssl, time, select, thread, subprocess, sys, signal etc) basically just take and block the main process for us , reactor cycle and trouble will come. Such libraries, for example, are psycopg2, request, mysql and others. In particular, psycopg2 is used in Django ORM as one of the database backends.

    What to do?

    There are three ways. Sophisticated, acceptable and good. Complex - to implement an analogue of the library on Twisted. Acceptable is to use deferToThread and run synchronous code in separate threads (using the thread pool implemented in Twisted). A good way (in my opinion) will be discussed in a note.
    Cross the hedgehog with snake


    Use green streams and events to switch context!



    What do we need for this?


    • Greenlets - lightweight "green" threads that work inside the main application process
    • Gevent - a framework that allows you to switch context between greenlets, at the moment when the executable code is blocked
    • Reactor method [deferToGreenlet] to wrap a greenlet in Deferred


    An example of the application of technology in a real project


    I did not write my own implementation of the reactor with the ability to send code to greenlets, as I found a ready-made solution, tested and implemented it in the project. Reactor code can be taken from here .

    To use geventreactor when initializing the application, you need to install it:

    from geventreactor import install
    install()
    


    Now new methods are available to us:
    __all__ = ['deferToGreenletPool', 'deferToGreenlet', 'callMultipleInGreenlet', 'waitForGreenlet', 'waitForDeferred',
               'blockingCallFromGreenlet', 'IReactorGreenlets', 'GeventResolver', 'GeventReactor', 'install']
    


    By analogy with reactor.deferToThread (f, * args, ** kwargs), you can call reactor.deferToGreenlet (f, * args, ** kwargs), where f is a callable object, and * args and ** kwargs are its arguments.

    To make it work, you must also patch the libraries in the namespace:
    from gevent import monkey
    monkey.patch_all()
    


    After these manipulations, the main Python libraries will be patched by the Gevent framework. See the Gevent documentation.

    Now all libraries or code that directly imports them, when they call locking methods or functions, will trigger the corresponding events in the Gevent system. Callbacks are hung on these events, allowing you to switch the context between the greenlets.

    My project uses Django ORM to manipulate data in PostgreSQL. Therefore, in order to prevent ORM methods from blocking the process, you need to use a special backend that allows you to create a pool of database connections and switch between connections. One of the backends is django-db-geventpool.

    Using django-db-geventpool is not difficult. Just follow the documentation.

    What's next?


    The reactor.deferToGreenlet method returns a Deferred object, which you can work with as a regular Deferred.

    For example, we have a model:

    class ExampleModel(models.Model):
        title = models.CharField(max_length=256)
    


    We want to get all the models and pass them to some processor inside the system. We can write something like:
    d = reactor.deferToGreenlet(ExampleModel.objects.all)
    


    And our code will not block the main process. Indeed, at the moment when Django ORM calls cursor.execute (), which will wait for a response from the database driver, geventreactor will switch the context to another Deferred.

    What is the result?


    We can execute synchronous code inside Twisted without creating unnecessary threads or processes without blocking the event loop of the reactor. The main thing is to follow the basic principles of working with asynchronous systems, pieces of code should not run for too long, gevent allows you to force context switching from anywhere in the code, where it is convenient for us, you just need to call gevent.sleep ().

    Also popular now: