Once again about multithreading and Python

    As you know, the main Python implementation of CPython ( python.org ) uses Global Interpreter Lock (GIL). This thing allows you to run only one Python stream at a time - the rest must wait for the GIL to switch to them.

    A Qualab colleague recently published a brisk article on Habré , offering an innovative approach: create a Python subinterpreter for the operating system stream, getting the opportunity to run all of our subinterpreters in parallel. Those. GIL, as it were, does not interfere at all.

    The idea is fresh, but it has one significant drawback - it does not work ...

    Let me first examine the GIL in a bit more detail, and then move on to analyzing the author’s mistakes.

    Gil


    I will briefly describe the GIL details that are significant for consideration in the Python 3.2+ implementation (a more detailed presentation of the subject can be found here ).

    Version 3.2 is selected for specificity and to reduce the amount of presentation. For 1.x and 2.x the differences are not significant.

    • GIL, as the name implies, is a synchronization object. Designed to block simultaneous access to the internal state of Python from different threads.
    • It can be captured by some stream or remain free (not captured).
    • Only one thread can capture a GIL at a time.
    • GIL is the only one in the entire process in which Python is running. I emphasize once again: GIL is not hidden in the subinterpreter or elsewhere - it is implemented as a set of static variables common to all process code.
    • From a GIL point of view, every thread that executes the Python C API calls must have a PyThreadState structure. GIL points to one of PyThreadState (working) or does not point to anything (GIL is released, threads work independently and in parallel).
    • After the interpreter starts, the only operation allowed on the Python C API with the GIL not captured is its capture. Everything else is forbidden (Py_INCREF is also technically safe, Py_DECREF can cause the object to be deleted, which can cause uncontrolled, unprotected simultaneous changes in the very internal state of Python, which the GIL tries to prevent). In DEBUG, there are more checks for incorrect operation with GIL; in RELEASE, the part is disabled to improve performance.
    • GIL switches by timer (default 5 ms) or by explicit call (
      PyThreadState_Swap, PyEval_RestoreThread, PyEval_SaveThread, PyGILState_Ensure, PyGILState_Release, etc.)


    As you can see, you can run simultaneous parallel code execution, you cannot make Python C API calls (this also concerns the execution of code written in python, of course).

    At the same time, “impossible” means (especially in the RELEASE assembly used by everyone) that this behavior is unstable. It may not break right away. It can generally work fine on this program, and with a small harmless change in the executed Python code, terminate with a segmentation fault and a bunch of side effects.

    Why sub-interpreters do not help


    What does a Qualab colleague do (you can find the link to the archive with the code in his article, I duplicated the source code on gist: gist.github.com/4680136 )?

    In the main thread, the GIL is immediately released through PyEval_SaveThread () . The main thread no longer works with python - it creates several worker threads and waits for them to complete.

    Workflow captures GIL . The code came out weird, but now it doesn’t matter. The main thing - GIL is clamped in our fist.

    And immediately, the parallel execution of workflows turns into sequential. It was possible not to make a construction with subinterpreters - the sense of them in our context is exactly zero, as expected.

    I do not know why the author did not notice this immediately, before the publication of the article. And then he persisted for a long time, preferring to call black white.

    Returning to parallel execution is simple - you need to release the GIL. But then it will not be possible to work with the Python interpreter.

    If you still do not give a damn about the ban and call the Python C API without the GIL, the program will break, and it’s not necessary right away and not without unpleasant side effects. If you want to shoot yourself in the foot in a particularly sophisticated way - this is your chance.

    I repeat again: GIL is one for the whole process, not for the interpreter-subinterpreter. GIL capture means that all threads executing Python code are suspended.

    Conclusion


    Like GIL or not - it is already there and I highly recommend learning how to work with it correctly.

    1. Or grab the GIL and call the Python C API functions.
    2. Or we release it and do what we want, but you can’t touch Python in this mode.
    3. Parallel operation is ensured by the simultaneous launch of several processes through multiprocessing or in some other way. The details of working with processes are beyond the scope of this article.


    The rules are simple, there are no exceptions and workarounds.

    Also popular now: