On the way to the Python core

    Hello, Habr! I present to you the translation of Toward a “Kernel Python” article by Glyph Lefkowitz (creator of the Twisted framework).

    More details - under the cut.

    The magic of minimizing the standard library


    Under the influence of Amber Brown ’s speech last month at the Python Language Summit (referring to her May report “Batteries are on, but they are leaking” - translator comment), Christian Hymes continued his work to reduce the standard Python library and created a PEP 594 proposal for explicit removal obsolete and unsupported fragments of it.

    The advent of PEP 594 (“Removing Dead Batteries from the Standard Library”) is great news for pythonists, especially those who support the standard library and will now have a smaller work front. A brief tour of the PEP gallery of obsolete or removal-based modules speaks for itself (the sunau, xdrlib, and chunk modules are my personal favorites). The standard Python library contains many useful modules, however, it also includes a real code necropolis, a towering monument of obsolete fragments that threaten to bury their developers at any time.

    However, I believe that the erroneous approach may be implemented in this PEP sentence. The standard library is currently supported in tandem with CPython developers. Large pieces of it were left in the vague hope that it would someday benefit someone. In the aforementioned PEP, this principle can be observed when protecting the colorsys module. Why not remove it? Answer: “This module is needed to convert CSS colors between coordinate systems (RGB, YIQ, HSL and HSV). [He] does not impose additional costs on the core development. ”

    There were times when Internet access was limited, and it might have been a good idea to pre-load Python with a ton of material, but nowadays the modules needed to convert colors between different coordinate systems are one step away from the pip install command.

    Why didn’t you consider my pool request?


    So, let's look at this statement: do tiny modules like colorsys impose “extra cost on core development”?

    It’s enough for the main developers that they are trying to maintain a huge and ancient base of C code, which is CPython itself. As Marietta Viggia said in her speech at North Bay Python, the most common question asked by kernel developers is: “Why haven't you looked at my pool request?” And what is the answer? It’s easier to ignore these pool requests. That's what it means to be a kernel developer!

    One might ask, does Twisted have the same problem? Twisted is also a large collection of loosely coupled modules; a kind of standard library for networking. Are all these clients and servers for SSH, IMAP, HTTP, TLS, etc. etc. trying to squeeze everything into one package?

    Forced to answer: yes . Twisted is monolithic because it comes from the same historical period as CPython, when component installation was really complicated. Therefore, I sympathize with the position of CPython.

    Ideally, at some point, each sub-project in Twisted should become a separate project with its own repository, continuous integration (CI), website and, of course, with its own more focused developers. We are already slowly but surely sharing projects where a natural boundary can be drawn. Some of the points that started in Twisted as constantly and incremental are separated, deferred and filepath are in the process of separation. Other projects, such as klein and treq, continue to live separately. We will do more when we find out how to reduce the costs of setting up and supporting continuous integration and how to release the infrastructure for each of them.

    But is the monolithic nature of Twisted the most urgent or even serious problem for the project? Let's appreciate it.

    At the time of this writing, Twisted had 5 unresolved pending pool requests in line. The average time that is spent on considering a ticket is, roughly speaking, four and a half days. The oldest ticket in the queue is dated April 22, which means that less than 2 months have passed since the oldest unreviewed pool request was sent.

    It is always difficult to find enough developers and time to respond to pool requests. Sometimes it seems that we still get too often the question "Why aren't you considering my pool request?" We do not always do it perfectly, but in general we manage; the queue fluctuates between 0 and 25 or so in the most unlucky month.

    What about the core of CPython compared to these numbers? Logging

    in github, you can see that at the moment, 429 tickets are waiting for consideration. The oldest of them is expected from February 2, 2018, i.e. almost 500 days.

    How many problems with the interpreter and how many problems with the stdlib library? Obviously a pending review is a problem, but will stdlib removal help?

    For a quick and unscientific assessment, I looked at the first (oldest) page of pool requests. In my subjective assessment, out of 25 pool requests, 14 were associated with the standard library, 10 with the core of the language or the interpreter code, and one was associated with a small documentation problem. On the basis of this proportion, I would venture to suggest that somewhere around half of the unreviewed pool requests are associated with the standard library code.

    So, the first reason why the main CPython team needs to stop supporting the standard library is because they literally have no physical ability to support the standard library. Or, in other words, they do not support it, and it remains only to admit it and begin to share the work.

    It is a fact that none of the CPython open pool requests are associated with the colorsys module. Indeed, it does not impose costs on the development of the kernel. Kernel development itself imposes these costs. If I wanted to update the colorsys module to keep up with the times (perhaps to have a Color object, perhaps to support integer color models), most likely I would have to wait 500 days or more.

    As a result of all this, it is more difficult to change the code in the standard library, which leads to a lesser interest of users in contributing. Infrequent releases of CPython also slow down library development and reduce the benefits of user feedback. It is no coincidence that almost all modules of the standard library have actively supported third-party alternatives, and this is not the fault of stdlib developers. The whole process is sharpened to stagnate all but the most commonly used stdlib modules.

    New environments, new requirements


    Perhaps even more important is that linking CPython to the standard library puts CPython itself in a privileged position compared to other language implementations.

    The podcast after the podcast , the podcast after the report, tells us that in order to continue success and develop Python, you need to grow in new areas: especially in the front-end, as well as mobile clients, embedded systems and console games.

    These environments require one or two conditions:

    • a completely different runtime (see Brython or MicroPython )
    • modified stripped down version of the standard library.

    In all these cases, the stumbling block is the definition of modules that must be removed from the standard library. They need to be found by trial and error; First of all, the process is completely different from the standard dependency determination process in a Python application. There is no install_requires declaration in setup.py reporting that the library uses a module from stdlib that the target Python runtime might skip due to space limitations.

    A problem can occur even if everything we use is standard Python on a Linux installation. Even Linux distributions for servers and desktop computers have an equal need for a smaller Python kernel package, so the standard library is already pretty truncated in them. This may not meet the requirements of the Python code and as a result lead to errors when even pip install will not work .

    Take it all away


    “What about the assumption that you need to clean a little every day? Although it sounds convincing, do not let yourself be deceived. The reason why it seems to you that the cleaning never ends is precisely because you are cleaning a little. [...] The main secret of success is this: if you remove it in one fell swoop, and not gradually, then you can permanently change your thinking and life habits ”
    Marie Kondo, Magical Cleaning. The Japanese Art of Restoring Order at Home and in Life ”(p. 15-16)

    While the gradual decrease in the standard library is a step in the right direction, gradual changes alone are not enough. As Marie Kondo says, if you really want to put things in order, the first step is to get everything out of sight in order to really see everything, and then return only what is needed.

    The time has come to pay tribute to those modules that are no longer encouraging, and send them a long way.
    We need a version of Python containing only the most necessary minimum so that all implementations can be consistent, and so that applications (even those that work in web browsers or microcontrollers) can simply state their requirements in requirements.txt.

    In some business environments, the idea of ​​having a huge standard library seems attractive because the addition of dependencies in requirements.txt is bureaucratic, however, the “standard library” in such environments has purely arbitrary boundaries.

    It might still be a good idea to supply some of the binary distributions of CPython (possibly even official ones) with a wide selection of modules from PyPI. Indeed, even for ordinary tasks, a certain amount of the stdlib library is required, which pip needs to install other necessary modules.

    Now there is a situation where pip is distributed along with Python, but is not developed in the CPython repository . Part of what the default Python installer comes with is developed in the CPython repository, part comes in a separate tarball for the interpreter.

    To use Linux, we need bootable media with a huge range of additional programs. But this does not mean that the Linux kernel itself is located in one giant repository, in which hundreds of applications needed for a working Linux server are developed by one team. The Linux kernel is extremely valuable, but the operating systems that use it are created from a combination of the Linux kernel and a wide range of separately developed libraries and programs.

    Conclusion


    The “Batteries Included” philosophy was ideally suited to its creation; like a booster rocket, it delivered Python to the programming public. However, as open source ecosystems and Python packages mature, this strategy becomes obsolete, and, like any accelerator, we must let it return to the ground so that it does not drag us back.

    New Python runtimes, new deployment tasks, and new developer audiences all provide the Python community with tremendous opportunities to reach new heights.

    But to achieve this, we need a new, more compact and non-overloaded Python core. We need to shake the entire standard library to the floor, leaving only the smallest pieces we need so that we can say: this is really necessary, but it's just nice to have.

    I hope that I have convinced at least some of you which Python core we need.

    And now: who wants to write PEP ?

    Also popular now: