komissarex January 27, 2014 at 22:47

Python on wheels

Transfer
Tutorial

The package system infrastructure for Python has long been criticized by both developers and system administrators. For a long time, even the community itself could not agree on which instruments to use in each particular case. Distutils, setuptools, distribute, distutils2 already exist as basic distribution mechanisms, and virtualenv, buildout, easy_install, and pip as high-level tools for managing all this mess.

Prior to setuptools, the main distribution format was source files or some binary MSI distributions for Windows. Under Linux, they were initially broken bdist_dumband bdist_rpmthat only worked on Red Hat-based systems. But even bdist_rpmworked not well enough for people to start using it.

A few years ago, PJE tried to fix this problem by providing a mixture of setuptools and pkg_resources to improve distutils and add metadata to Python packages. In addition to this, he wrote the easy_install utility to install them. Due to the lack of a distribution format that supports metadata, the 'egg' format was provided.

Python eggs are regular zip archives containing a python package and the necessary metadata. Although many people probably never intentionally collected eggs, their metadata format is still alive and well. And everyone is deploying their projects using setuptools.

Unfortunately, some time later, the community split up, and part of it proclaimed the death of binary formats and 'eggs' in particular. After that, pip, replacing easy_install, stopped accepting the egg format.

Then a little more time passed, and the rejection of binary packages began to cause inconvenience. People began to deploy more and more to cloud servers, and the need to recompile C-shny libraries on each machine is not very pleasing. Since the 'eggs' at that time were obscure (I think so), they were redone in new PEPs, and they were called ' wheels '.

It is further assumed that all actions take place in a virtualenv environment .

What kind of wheel?

Let's start with a simple one. What are 'wheels' and how do they differ from 'eggs'? Both formats are zip files. The main difference is that eggs can be imported without unpacking, while the wheel will have to be unpacked. Although there are no technical reasons for making the wheels non-importable, support for their direct import has never been planned.

Another difference is that 'eggs' contain compiled bytecode, while 'wheels' do not. The main advantage of this is that there is no need to create separate wheels for each version of Python until you have to distribute the modules linked via libpython. Although in new versions of Python 3, using stable ABI, even this can already be cranked up.

However, the wheel format is also not without problems, some of which it inherits from the 'eggs'. For example, binary distributions under Linux are still unacceptable for most because of two drawbacks: Python itself compiles under Linux in different forms, and the modules link to different system libraries. The first problem is caused by the coexistence of incompatible versions of Python 2: USC2 and USC4. ABI changes depending on the compilation mode. Currently, the wheel (as far as I can tell) does not contain information about which Unicode mode the library is associated with. A separate problem is that Linux distributions are less compatible with each other than we would like, and circumstances may arise so that an assembly compiled for one distribution will not work on the others.

All this translates into the fact that, generally speaking, binary 'wheels' cannot be uploaded to PyPI at the moment as incompatible with various systems.

In addition to all this, the wheel now knows only two extremes: binary packages and packages containing pure python code. Binary packages are specific to Python branch 2.x. Now this does not seem to be a big problem, because the 2.x cycle is coming to an end, and the packages collected only for 2.7 will be enough for a long time. But if it suddenly came to Python 2.8, it would be interesting to say that this package does not depend on the version of Python, but it contains binaries, so it cannot be independent of architecture.

The only case justifying the existence of such a package is when it contains distributed libraries loaded with ctypes from CFFI. Such libraries are not linked through libpython and are independent of the language implementation (they can even be used with pypy).

But there is a bright side: nothing prohibits the use of binary wheels in their own homogeneous infrastructures.

Wheel assembly

So now we know what wheel is. How to make your own 'wheel'? Assembling from your own libraries is the simplest process. All you need is a fresh version setuptoolsand a library wheel. Once both are installed, the 'wheel' is assembled with the following command:

$ python setup.py bdist_wheel

Wheel will be created in the package directory. However, there is one thing to be wary of: the distribution of binaries. By default, the assembled 'wheel' (assuming no binary steps are used in setup.py) consists of pure-python code. This means that even if distributed .so, .dylibor .dllas part of its package, the resulting 'wheel' will look platform-independent.

The solution to this problem is to manually implement Distribution from setuptools by throwing the purity flag in false:

import os
from setuptools import setup
from setuptools.dist import Distribution
class BinaryDistribution(Distribution):
    def is_pure(self):
        return False
setup(
    ...,
    include_package_data=True,
    distclass=BinaryDistribution,
)

Wheel setting

Using the latest version of pip, the 'wheel' is set as follows:

$ pip install package-1.0-cp27-none-macosx_10_7_intel.whl

But what about the dependencies? There are some difficulties. Usually one of the requirements for a package is the ability to install it even without an internet connection. Fortunately, pip allows you to disable loading from the index and install a directory containing everything you need to install. If we have wheels for all the dependencies of the required versions, you can do the following:

$ pip install --no-index --find-links=path/to/wheels package==1.0

Thus, the 1.0package version will be installed packagein our virtual environment.

Wheels for addictions

Okay, but what if we don't have .whl for all our dependencies? Pip in theory allows you to solve this problem using the command wheel. This should work something like this:

pip wheel --wheel-dir=path/to/wheels package==1.0

This command will unload all the packages on which our package depends on the specified folder. But there are a couple of problems.
The first is that the team currently has a bug that does not unload dependencies that are already 'wheels'. So if the dependency is already available on PyPI in wheel format, it will not be loaded.

This is temporarily solved by a shell script that manually moves downloaded wheels from the cache.

#!/bin/sh
WHEEL_DIR=path/to/wheels
DOWNLOAD_CACHE_DIR=path/to/cache
rm -rf $DOWNLOAD_CACHE_DIR
mkdir -p $DOWNLOAD_CACHE_DIR
pip wheel --use-wheel -w "$WHEEL_DIR" -f "$WHEEL_DIR" \
  --download-cache "$DOWNLOAD_CACHE_DIR" package==1.0
for x in "$DOWNLOAD_CACHE_DIR/"*.whl; do
  mv "$x" "$WHEEL_DIR/${x##*%2F}"
done

The second problem is a little more serious: how will pip find our own package if it is not on PyPI? That's right, nothing. In this case, the documentation recommends using not pip wheel package, but pip wheel -r requirements.txt, where it requirements.txtcontains all the necessary dependencies.

Building packages using DevPI

Such a temporary solution to the dependency problem is quite applicable in simple situations, but what if there are many internal python packages that depend on each other? This design quickly falls apart.

Fortunately, last year Holker Krekel created a solution to this disaster called DevPI., which is essentially a hack emulating pip with PyPI. Once installed on a computer, DevPI acts as a transparent proxy before PyPI and allows pip to install packages from the local repository. In addition, all packages downloaded from PyPI are automatically cached, so even if you disconnect the network, these packages will be available for installation. And, in the end, it becomes possible to upload their own packages to the local server in order to refer to them in the same way as those stored in the public index.

I recommend installing DevPI in local virtualenv, after which add links to devpi-serverand devpiin PATH.

$ virtualenv devpi-venv
$ devpi-venv/bin/pip install --upgrade pip wheel setuptools devpi
$ ln -s `pwd`/devpi-venv/bin/devpi ~/.local/bin
$ ln -s `pwd`/devpi-venv/bin/devpi-server ~/.local/bin

After that, it remains just to start devpi-server, and it will work until the manual stop.

$ devpi-server --start

After starting it, you need to initialize it once:

$ devpi use http://localhost:3141
$ devpi user -c $USER password=
$ devpi login $USER --password=
$ devpi index -c yourproject

Since I use DevPI 'for myself', the DevPI username and system user name are the same. At the last step, an index is created by the name of the project (if necessary, you can create several).

To redirect pip to the local repository, you can export the environment variable:

$ export PIP_INDEX_URL=http://localhost:3141/$USER/yourproject/+simple/

I will put this command in postactivatemy virtualenv script to prevent accidental loading from the wrong index.

To place your own wheels in the local DevPI, use the utility devpi:

$ devpi use yourproject
$ devpi upload --no-vcs --formats=bdist_wheel

The flag --no-vcsdisables magic that attempts to identify a version control system and moves some files in the first place. I don’t need this, because in my projects files are distributed that I don’t include in VCS (binaries, for example).

In the end, I highly recommend splitting the setup.py files in such a way that PyPI rejects them, and DevPI accepts so as not to accidentally overload their code with setup.py resease. The easiest way to do this is to add an incorrect PyPI classifier:

setup(
    ...
    classifier=['Private :: Do Not Upload'],
)

Wrap up

Now everything is ready to start using internal dependencies and assembling your own 'wheels'. As soon as they appear, they can be archived, downloaded to another server and installed in a separate virtualenv.
The whole process will become a little easier when you pip wheelstop ignoring existing wheel packages. In the meantime, the shell script above is not the worst solution.

Compared to 'eggs'

Now the wheel format is more attractive than egg. Its development is more active, PyPI began to add its support and, since utilities start working with it, it looks like the best solution. 'Eggs' are so far only supported by easy_install, although most have long switched to pip.

I believe the Zope community is still the largest egg-based and buildout-based community. And I believe that if the solution based on 'eggs' is applicable in your case, it should be applied. I know that many do not use eggs at all, preferring to create virtualenvs, archive them and send them to different servers. Just for such a deployment, wheels are the best solution, as different servers may have different library paths. There was a problem related to the fact that.pycfiles were created on a build server for virtualenv, and these files contain specific file paths. Using wheel, they .pycare created after installation in a virtual environment and will automatically have the correct paths.

So now you have it. Python on wheels. And it kind of even works, and maybe worth the time spent.

Tags: