wronglink July 5, 2013 at 14:02

Porting to Python 3. Work on bugs

Transfer
Tutorial

Note from the translator: I
present to you the translation of an interesting article by Armin Ronacher, the author of the Flask and Werkzeug web frameworks, the Jinja2 template engine and the generally famous pythonist about the current techniques and pitfalls that he uses in his projects when adding support for the third python. A short note about the title of this article. It is a reference to Armin’s 2010 article “Porting to Python 3. A Guide” , in which he described the preparation of code for automatic porting through the 2to3 utility . As practice shows, today this approach is more likely to be antipattern, as on the one hand, the quality of the code as a result of such operations is noticeably deteriorating, and in addition, such code is noticeably more difficult to maintain.

After an extremely painful experience porting Jinja2 to the third python, I had to leave the project idling for a while, because I was too much afraid of breaking support for python version 3. The approach I used consisted of writing code for python version 2 and translating 2to3 to third python during package installation. The most unpleasant side effect was that any change you make takes about a minute to translate, thereby killing the speed of your iterations. Fortunately, it turned out that if you correctly specify the final version of the python, the process is significantly faster.

Thomas Voldman from the MoinMoin project started by launching Jinja2 through my python-modernizewith the correct parameters, and came to a single code that works under 2.6, 2.7 and 3.3. Using small tidbits, we were able to come to a nice code base that works with all versions of python and at the same time, for the most part, looks like a regular python code.

Inspired by this result, I went through the code several times and began translating some other code to experiment further with the combined code base.

In this article, I will selectively review some tips and tricks that I can share in case they help anyone in similar situations.

Throw out support 2.5, 3.1 and 3.2

This is one of the most important tips. Refusing to support Python 2.5 today is more than possible, because there are not so many people using it. The rejection of 3.1 and 3.2 is a fairly simple solution, given the low popularity of the third python. But what's the point of refusing to support these versions? In short, 2.6 and 3.3 contain a large number of overlapping syntax and features that allow the same code to work normally in both cases:

Compatible string literals. 2.6 and 3.3 support the same syntax for strings. You can use both 'foo'for native string types (byte strings in 2.x and unicode strings in 3.x), and u'foo'for unicode strings and b'foo'for byte strings or byte objects.
Compatible printsyntax. In case you are using print's, you can add from __future__ import print_functionand use printas a function, without having to use a wrapper function and suffer from other incompatibilities.
Compatible capture syntax. Python 2.6 introduces a new syntax except Exception as ethat is used in 3.x.
Class decorators are available. They are extremely useful for automatically correcting moved interfaces without having to leave traces on the class structure. For example, they can help to automatically rename a method name from nextto __next__, or __str__to __unicode__in python 2.x.
Built-in function next()to call nextor __next__. Convenient, because it works at about the same speed as a direct method call, so you don’t have to pay performance compared to runtime checks or adding your own wrapper function.
In Python 2.6, a new type was added bytearraywith the same interface as in 3.3. This is useful because while python 2.6 lacks an object bytes, it has a built-in object that, having the same name, is a synonym strand behaves in a completely different way.
In Python 3.3, codecs for translating from bytes to bytes and from lines to lines that were broken in 3.1 and 3.2 reappeared. Unfortunately, their interfaces have become more complex and there are no aliases, but this is all much closer to what was in 2.x than before. This is especially important if you need stream-based coding. This functionality was completely absent from 3.0 to 3.3.

Yes, six will help you move forward, but don't underestimate the benefits of seeing clean code. I corny lost interest in supporting ported to the third python Jinja2, because I was horrified by her code. At that time, the combined code looked ugly and suffered in terms of performance (constant six.b('foo')and six.u('foo')), or it had a low iteration speed of 2to3. Now, having dealt with this all, I again enjoy. The Jinja2 code looks very clean and you have to search to find compatibility support for python 2 and 3 versions. Only a few pieces of code do something in style if PY2:.

The rest of the article assumes that you want to support these versions of python. Also, attempts to maintain Python version 2.5 are very painful and I highly recommend that you refrain from them. Support 3.2 is possible if you are ready to wrap all your lines in function calls, which I personally would not recommend doing for reasons of aesthetics and performance.

Give up six

Six is a pretty neat library and Jinja2 started with it. But in the end, if you count, then six will not have so many necessary things to start a port under the third python. Of course, six is necessary if you intend to support Python 2.5, but starting with 2.6 and more, there are not many reasons to use six. Jinja2 has the _compat module , which contains some necessary helpers. Including multiple lines not in Python 3, the entire compatibility module contains less than 80 lines of code.

This will help you avoid problems when users expect a different version of package six due to a different library or adding a different dependency to your project.

Start with Modernize

Python-modernize is a good library to start porting. This is the 2to3 version that generates code that works in both versions of python. Despite the fact that there are enough bugs in it, and the default options are not the most optimal, it can help you to seriously move forward, doing boring work for you. In this case, you still have to go over the code and clean up some imports and roughnesses.

Correct your tests

Before you start doing anything else, go over your tests and make sure that they still haven't lost their meaning. A large number of problems in the standard Python library versions 3.0 and 3.1 appeared as a result of an inadvertent change in the behavior of tests as a result of porting.

Write a compatibility module

So, if you decide to give up six, can you live without helpers? The correct answer is no. You still need a small compatibility module, but it should be small enough for you to keep it in your package. Here is a simple example of how a compatibility module might look:

import sys
PY2 = sys.version_info[0] == 2
if not PY2:
    text_type = str
    string_types = (str,)
    unichr = chr
else:
    text_type = unicode
    string_types = (str, unicode)
    unichr = unichr

The code for this module will depend on how much has changed for you. In the case of Jinja2, I put several functions there. There, for example, there are ifilter , imap and other similar functions from itertools that became part of the standard library in 3.x (I use function names from 2.x so that the reader can understand that using iterators is intentional and not an error )

Check for 2.x, not for 3.x

At some point, you will have to check if the code is running in the 2.x or 3.x version of python. In this case, I would recommend that you check the second version first, and put the check on the third version in the else branch, and not vice versa. In this case, you will get fewer unpleasant surprises when the 4th version of python appears.

Good:

if PY2:
    def __str__(self):
        return self.__unicode__().encode('utf-8')

Not so perfect:

if not PY3:
    def __str__(self):
        return self.__unicode__().encode('utf-8')

Line processing

The biggest change in the third python, without a doubt, was the change in the Unicode interface. Unfortunately, these changes turned out to be rather painful in some places and inconsistently changed the standard library. Most of the porting time will be spent at this stage. This is actually a topic for a separate article, but here is a small list of points that Jinja2 and Werkzeug adhere to:

'foo'always means what I call a native string implementation. These are strings that are used in identifiers, source code, file names, and other low-level functions. In addition, in 2.x it is permissible to use unicode strings as literals, but only if they contain only ASCII characters.

This feature is very useful for a single code base, as the general trend in the third python is to add Unicode support in interfaces that did not support it before, and never vice versa. Since native string literals are “upgraded” to Unicode, but still support Unicode in 2.x, they can be very useful.

For example, the functiondatetime.strftimecompletely does not support Unicode in the second python, but is only unicode in the third version. Since in most cases the return value in 2.x was in exclusively in ASCII, such things will work in both 2.x and 3.x:
```
>>> u'Current time: %s' % datetime.datetime.utcnow().strftime('%H:%M')
u'Current time: 23:52'
```
The string passed to strftimenative (bytes in 2.x, unicode in 3.x). The return value is again a native string and exclusively in ASCII. As a result, a properly formatted unicode string will be returned in both 2.x and 3.x.
u'foo'always means unicode string. A large number of libraries already perfectly support Unicode in 2.x, so unicode literals will not come as a surprise to anyone.
b'foo'always means something that knows how to store real bytes. Since 2.6, in fact, does not have a- bytesobject, unlike python 3.3, which in turn lacks real byte lines, the usefulness of this literal is somewhat limited. But it becomes useful again if you use it in tandem with an object bytearraythat has the same interface in 2.x and 3.x:
```
>>> bytearray(b' foo ').strip()
bytearray(b'foo')
```
Since it is mutable, at the stage of working directly with bytes, you can convert it to something more familiar, wrapping the result again in bytes().

In addition to these simple rules, I added variables text_type, unichrand string_typesto my compatibility module, as shown above. As a result, the following changes occur:

isinstance(x, basestring) becomes isinstance(x, string_types)
isinstance(x, unicode) becomes isinstance(x, text_type)
isinstance(x, str)if necessary, byte processing becomes isinstance(x, bytes), orisinstance(x, (bytes, bytearray))

I also wrote a class decorator implements_to_stringthat helps implement classes with methods __unicode__or __str__:

if PY2:
    def implements_to_string(cls):
        cls.__unicode__ = cls.__str__
        cls.__str__ = lambda x: x.__unicode__().encode('utf-8')
        return cls
else:
    implements_to_string = lambda x: x

The main idea is to implement the method __str__in both 2.x and 3.x, allowing it to return unicode strings (yes, it looks somewhat clumsy in 2.x), and the decorator will automatically rename it __unicode__to 2.x, and add __str__which calls __unicode__and encodes the result of its call in utf-8. This approach has been quite widespread recently in modules for 2.x. So do for example Jinja2 or Django.

Here is a usage example:

@implements_to_string
class User(object):
    def __init__(self, username):
        self.username = username
    def __str__(self):
        return self.username

Metaclass Syntax Changes

Since in the third python changes in the syntax for defining metaclasses are incompatible with the second, the porting process becomes a little more difficult. Six has a feature with_metaclassthat is designed to solve this problem. She creates an empty class, which is then visible in the inheritance tree. I did not like this solution for Jinja2, so I changed it. The external API remains the same, but the implementation uses a temporary class to add a metaclass. The advantages of such a solution are that you do not need to pay performance for using it, while the inheritance tree remains clean.

The solution code is somewhat confusing to understand. The main idea relies on the ability of the metaclass to change the class at creation time, which is used by the parent class. My solution uses a metaclass to remove its parent from the inheritance tree when inheriting classes. In the end, the function creates an empty class with an empty metaclass. The metaclass of the inherited empty class has a constructor that instantiates the new class from the correct parent and assigns the desired metaclass (Note: I'm not sure that everything translated correctly - the sources below seem more eloquent) . Thus, an empty class and metaclass are never visible.

Here's what it looks like:

def with_metaclass(meta, *bases):
    class metaclass(meta):
        __call__ = type.__call__
        __init__ = type.__init__
        def __new__(cls, name, this_bases, d):
            if this_bases is None:
                return type.__new__(cls, name, (), d)
            return meta(name, bases, d)
    return metaclass('temporary_class', None, {})
And here is how you use it:
class BaseForm(object):
    pass
class FormType(type):
    pass
class Form(with_metaclass(FormType, BaseForm)):
    pass

Dictionaries

One of the annoying changes in the third python was changes to the protocols of dictionary iterators. In vtrorom python in all dictionaries have methods: keys(), values()and items(), nourish lists, and iterkeys(), itervalues()and iteritems(), returns an iterator. In the third python, none of them are. Instead, they have been replaced by methods that return view objects.

keys()returns a view object that behaves like an immutable set, values()returns an iterable read-only container (but not an iterator!), and items()returns something similar to an immutable set. Unlike regular sets, they can also point to mutable objects, in which case some methods may fall while the program is running.

Despite the fact that a large number of people miss the point that view objects are not iterators, in most cases you can simply ignore it. Werkzeug and Django implement several of their own dictionary-like objects, and in both cases the solution was to simply ignore the existence of view objects and let keys()his friends return iterators.

At the moment, this is the only reasonable solution, taking into account the restrictions imposed by the python interpreter. There are problems with:

The fact that view objects are not iterators per se means that you create temporary objects for no particular reason.
Behavior similar to sets of built-in dictionary view objects cannot be played in pure python due to interpreter limitations
Implementing view objects for 3.x and iterators for 2.x would mean a lot of code repetition.

Here's what Jinja2 stopped at in terms of iterating over dictionaries:

if PY2:
    iterkeys = lambda d: d.iterkeys()
    itervalues = lambda d: d.itervalues()
    iteritems = lambda d: d.iteritems()
else:
    iterkeys = lambda d: iter(d.keys())
    itervalues = lambda d: iter(d.values())
    iteritems = lambda d: iter(d.items())

To implement objects like dictionaries, the class decorator helps us again:

if PY2:
    def implements_dict_iteration(cls):
        cls.iterkeys = cls.keys
        cls.itervalues = cls.values
        cls.iteritems = cls.items
        cls.keys = lambda x: list(x.iterkeys())
        cls.values = lambda x: list(x.itervalues())
        cls.items = lambda x: list(x.iteritems())
        return cls
else:
    implements_dict_iteration = lambda x: x

In this case, all you have to do is implement the method keys()and its friends as iterators, everything else will happen automatically.

@implements_dict_iteration
class MyDict(object):
    ...
    def keys(self):
        for key, value in iteritems(self):
            yield key
    def values(self):
        for key, value in iteritems(self):
            yield value
    def items(self):
        ...

General Iterator Changes

Since iterators have changed basically, a couple of helpers are needed to fix the situation. In fact, the only change was the transition from next()to __next__. Fortunately, this is already transparently handled. The only thing you need to do is fix x.next()on next(x), and python will take care of the rest.

If you plan to declare iterators, again, a class decorator will help:

if PY2:
    def implements_iterator(cls):
        cls.next = cls.__next__
        del cls.__next__
        return cls
else:
    implements_iterator = lambda x: x

To implement a class, just name the method of the next iteration step __next__:

@implements_iterator
class UppercasingIterator(object):
    def __init__(self, iterable):
        self._iter = iter(iterable)
    def __iter__(self):
        return self
    def __next__(self):
        return next(self._iter).upper()

Change codecs

One of the great features of the encoding protocol in the second python was its type independence. You could register an encoding that would translate the csv file into a numpy array if you needed it. This possibility, however, was not very well known, because during the demonstrations, the string encodings were the main encoding interface. Starting with 3.x they became more stringent, so most of the functionality was removed in version 3.0, and returned back to 3.3, because proved its worth. Simply put, codecs that would not be involved in encoding between Unicode and bytes were not available until 3.3. Among them, for example, codecs hex and base64.

Here are two examples of using these codecs: line operations and data stream operations. good oldstr.encode()of 2.x is now mutated. If you want to support 2.x and 3.x, given the changing API string:

>>> import codecs
>>> codecs.encode(b'Hey!', 'base64_codec')
'SGV5IQ==\n'

You will also notice that codecs in 3.3 have lost aliases, and you need to write explicitly 'base64_codec', instead 'base64'.

Using these codecs is preferable to using functions from the binacsii module , because they support operations on data streams through support for incremental encoding and decoding .

Other notes

There are also several points for which I still do not have a good solution, or which are annoying, but are so rare that I do not want to deal with them. Some of them, unfortunately, are part of the third python API and are almost invisible until you consider the boundary cases.

The file system and file IO access continue to be annoying on Linux, because it is not based on Unicode. The function open()and level of the file system can have dangerous default settings. If, for example, I go over SSH to a machine with a locale en_USfrom machine c de_AT, python likes to switch to ASCII encoding for working with the file system and for file operations.

In the general case, I consider the most reliable way to work with text in the third python, which also works fine in 2.x - just open files in binary mode and decode explicitly. Alternatively, you can also use functions in codecs.openeither io.open2.x and the built- openin 3.x with explicit encoding.
URLs in the standard library are displayed incorrectly in the form of Unicode, which may interfere with the normal use of some URLs in 3.x.
The ejection of executions with a traceback object requires an assistant function since The syntax has been changed. This, in general, is not a very common problem and is quite easily solved by a wrapper. Because The syntax has changed, here you have to put the code inside the exec block:
```
if PY2:
    exec('def reraise(tp, value, tb):\n raise tp, value, tb')
else:
    def reraise(tp, value, tb):
        raise value.with_traceback(tb)
```
The previous hack execis useful if you have code that is syntax dependent. But since the syntax of exec itself has changed, now you have no way to call anything with an arbitrary namespace. This is not a big problem, because evaland compilecan be used as a replacement that works in both versions. You can also declare a function exec_, through itself exec.
```
exec_ = lambda s, *a: eval(compile(s, '', 'exec'), *a)
```
If you have a C module written on top of the python C API, you can shoot yourself right away. At the moment, I am not aware of the existence of any tools that could help here. Use this opportunity to change the way you use to write modules and rewrite everything using cffi or ctypes . If you do not consider this option because you have something like numpy, then the only thing that remains for you is to humbly accept the pain. You could also try writing some nasty thing on top of the C preprocessor to help make porting easier.
Use tox for local testing. The ability to run tests under all the necessary versions of python at a time is a very cool thing that will help you avoid a lot of problems.

Conclusion

A single code for 2.x and 3.x is quite possible today. Of course, most of the porting time will have to be spent on figuring out how the new APIs behave with respect to Unicode and how the compatibility of various modules could change. In any case, if you are going to port libraries, do not mess with versions of python less than 2.5, as well as 3.0-3.2, and you can avoid a lot of pain.

Tags: