freetonik January 21, 2014 at 12:56

Why are there so many pythons?

Transfer

Python is amazing.

Surprisingly, this is a rather controversial statement. What do I mean by Python? Maybe an abstract Python interface? Or CPython, a common Python implementation (not to be confused with a similar Cython name)? Or do I mean something completely different? Maybe I'm indirectly referring to Jython, or IronPython, or PyPy. Or maybe I'm distracted so much that I'm talking about RPython or RubyPython (which are very different).

Despite the similarities in the names of the above technologies, some of them have completely different tasks (or, at least, work in completely different ways)

When working with Python, I came across a bunch of such technologies. Tools * ython. But only recently I took the time to figure out what they represent, how they work and why they (each in its own way) are necessary.

In this post, I will start from scratch and walk through various Python implementations, and end with a detailed introduction to PyPy, which, in my opinion, is the future of the language.

It all starts with understanding what Python really is.

If you have a good understanding of machine code, virtual machines, and so on, you can skip this section.

Is Python interpreted or compiled?

This is a common source of misunderstanding among Python newbies.

The first thing to understand: “Python” is the interface. There is a specification that describes what Python should do and how it should behave (which is true for any interface). And there are several implementations (which is also true for any interface).

Second: “interpreted” and “compiled” are the properties of implementation, but not of the interface.

So the question itself is not entirely correct.

In the case of the most common implementation (CPython: written in C, often called simply “Python”, and, of course, the one you use if you have no idea what I’m talking about) the answer is: interpreted, with some compilation. CPython compiles * the Python source code into a bytecode, and then interprets that bytecode, starting it in the process.

* Note: this is not exactly “compilation” in the traditional sense. Usually, we think that “compilation” is a conversion from a high-level language into machine code. However - in some ways this is a “compilation".

Let's study this answer better, as it will help us understand some of the concepts that await us in this article.

Bytecode or machine code

It is very important to understand the difference between bytecode and machine (or native) code. Perhaps the easiest way to understand it is by example:

- C is compiled into machine code, which is subsequently launched directly by the processor. Each instruction forces the processor to perform different actions.
- Java is compiled into bytecode, which is subsequently run on the Java Virtual Machine (JVM), an abstract computer that runs programs. Each instruction is processed by a JVM that interacts with a computer.

Simplifying greatly: machine code is much faster, but bytecode is better portable and secure.

The machine code may vary depending on the machine, while the bytecode is the same on all machines. We can say that machine code is optimized for your configuration.

Returning to CPython, the chain of operations is as follows:

1. CPython compiles your Python source code into bytecode.
2. This bytecode runs on the CPython virtual machine.

Beginners often assume that Python is compiled due to the presence of .pyc files. This is partly true: .pyc files are compiled bytecode, which is subsequently interpreted. So if you ran your code in Python and you have a .pyc file, then the second time it will work faster, because it will not need to recompile into bytecode.

Alternative virtual machines: Jython, IronPython and others

As I said above, Python has several implementations. Again, as mentioned above, CPython is the most popular. This version of Python is written in C and is considered a “default” implementation.

But what about alternatives? One of the most prominent is Jython , a Python implementation in Java that uses the JVM. While CPython generates bytecode to run on the CPython VM, Jython generates Java bytecode to run on the JVM (this is the same as when compiling the Java program).

“Why might you need to use an alternative implementation?”, You ask. Well, for starters, different implementations get along well with different sets of technologies .

CPython makes it easy to write C extensions for Python code because at the end it is started by the C interpreter. Jython, in turn, simplifies working with other Java programs: you can import any Java classes without additional effort, invoking and using your Java classes from Jython programs. (Note: if you haven’t thought about it seriously, it’s pretty crazy. We lived to see the time when you can mix different languages and compile them into one entity. As Rostin noted , programs that mix Fortran and C code appeared quite a long time ago, so it’s not entirely new. But still cool.)

As an example, here is the correct Jython code: IronPython

[Java HotSpot(TM) 64-Bit Server VM (Apple Inc.)] on java1.6.0_51

>>> from java.util import HashSet

>>> s = HashSet(5)

>>> s.add("Foo")

>>> s.add("Bar")

>>> s

[Foo, Bar]

this is another popular Python implementation written entirely in C # for .NET. In particular, it runs on the .NET virtual machine, if you can call it that, on the Common Language Runtime (CLR) , from Microsoft, comparable to the JVM.

We can say that Jython: Java :: IronPython: C #. They work on the corresponding virtual machines, it is possible to import C # classes into IronPython code and Java classes into Jython code, and so on.

It is quite possible to survive without touching anything other than CPython. But, moving on to other implementations, you get an advantage, mainly because of the used technology stack. Using many JVM-based languages? Jython may suit you. Is everything on .NET? It might be worth a try IronPython (and maybe you already did).

By the way, although this does not become a reason for switching to another implementation, it is worth mentioning that these implementations actually differ in behavior. This applies not only to the interpretation of Python code. However, these differences are usually not significant, they disappear and appear over time due to active development. For example, IronPython uses Unicode strings by default ; but CPython uses ASCII in 2.x versions (giving UnicodeEncodeError error for non-ASCII characters), and thus supports Unicode characters in the default 3.x versions .

Compilation on the fly (Just-in-Time Compilation): PyPy and the future

So, we have a Python implementation written in C, one more in Java, and the third in C #. The next logical step: implementation of Python, written in ... Python. (A trained reader will notice that this statement is a little misleading).

This is why it can be confusing. To get started, let's discuss compilation on the fly (just-in-time or JIT).

Jit. Why and how

Let me remind you that native machine code is much faster than bytecode. Well, what if you could compile part of the bytecode and run it as native code? I would have to “pay” some price (in other words: time) for compiling the bytecode, but if the result works faster, then that's great! This motivates the JIT compilation, a hybrid technique that combines the advantages of interpreters and compilers. In a nutshell - JIT is trying to use compilation to speed up the interpretation system.

For example, here is a common JIT approach:

Define a bytecode that runs frequently.
Compile it into native machine code.
Cache the result.
Whenever you need to run the same bytecode, use the already compiled machine code and reap the benefits (in particular, speed increase).

That's the whole point of PyPy: using JIT in Python (in the appendix you can find previous attempts). Of course, there are other goals: PyPy is aimed at cross-platform, working with a small amount of memory and stackless support (abandoning the C language call stack in favor of its own stack). But JIT is the main advantage. Based on time tests, the average acceleration factor is 6.27 . More detailed data can be obtained from the scheme from PyPy Speed Center :

PyPy is hard to figure out

PyPy has huge potential, and at the moment it is well compatible with CPython (so that you can run Flask, Django , etc. on it).

But with PyPy there is a lot of confusion. (appreciate, for example, this meaningless proposal to create PyPyPy ...). In my opinion, the main reason PyPy is at the same time:

1. A Python interpreter written in RPython (not Python (I tricked you before that)). RPython is a subset of Python with static typing. In Python, having thorough conversations about types is “ generally impossible ” why is it so difficult? consider the following:

x = random.choice([1, "foo"])

this is the correct Python code (thanks to Ademan'y). What type is x? How can we discuss types of variables when types are not even forced?). In RPython, we sacrifice some flexibility, but in return we get the opportunity to manage memory much more and much more, which helps with optimization.

2. A compiler that compiles RPython code into various formats and supports JIT. The default platform is C, that is, the RPython-in-C compiler, but you can also select JVM and others as the target platform.

For ease of description, I will call them PyPy (1) and PyPy (2).

Why might these two things be needed, and why - in one set? Think of it this way: PyPy (1) is an interpreter written in RPython. That is, he takes the user code in Python and compiles it into bytecode. But for the interpreter itself (written in RPython) to work, it must be interpreted by another Piton implementation, right?

So, you can simply use CPython to start the interpreter. But it will not be too fast.

Instead, we use PyPy (2) (called the RPython Toolchain) to compile the PyPy interpreter into code for another platform (for example, C, JVM, or CLI) to run on the destination machine, with the addition of JIT. This is magical: PyPy dynamically adds JIT to the interpreter, generating its own compiler! (Again, this is crazy: we compile the interpreter by adding another separate, stand-alone compiler).

In the end, the result will be a standalone executable that interprets Python source code and uses JIT optimization. Exactly what is needed! It’s difficult to understand, but perhaps this scheme will help:

Repeat: PyPy’s real beauty is that we can write a bunch of different Python interpreters on RPython without worrying about JIT (apart from a couple of details) After that, PyPy implements JIT for us using the RPython Toolchain / PyPy (2).

In fact, if you dig deeper into abstraction, theoretically you can write an interpreter for any language, direct it to PyPy and get a JIT for that language. This is possible because PyPy focuses on optimizing the interpreter itself, rather than the details of the language that it interprets.

As a digression, I would like to note that JIT itself is absolutely amazing. He uses a technique called tracing, which works as follows :

Run the interpreter and interpret everything (without adding a JIT).
Conduct easy profiling of interpreted code.
Identify operations that have already been performed previously.
Compile these pieces of code into machine code.

You can learn more from this readily available and very interesting publication .

To summarize: we use the RPython-in-C PyPy compiler (or other target platform form) to compile the PyPu interpreter implemented on RPython.

Conclusion

Why is all this so amazing? Why is it worth chasing this crazy idea? In my opinion, Alex Gaynor explained this very well in his blog : “[For PyPy future] because [he] is faster, more flexible and is the best platform for the development of Python.”

In short:

It is fast - because it compiles the source code into native code (using JIT).
It is flexible - because it adds JIT to the interpreter without much effort.
It is flexible (again) - because you can write interpreters in RPython, which subsequently simplifies the extension compared to the same C (in fact, it simplifies so much that there is even an instruction for writing your own interpreters).

Addition: other names that you may have heard.

Python 3000 (Py3k) : Alternate name for Python 3.0, the main release of Python with backward compatibility , which appeared in 2008 . year. The Py3k team predicted that the new version would take about five years to fully take root. And while most (attention: farfetched statement) Python developers continue to use Python 2.x, people are more and more thinking about Py3k.

Cython : A superset of Python that includes the ability to call C functions.

Objective: Let us write C extensions for Python programs.
It also allows you to add static typing to existing Python code, which after recompiling can help achieve C-like performance.
Reminds PyPy, but it's not the same. In the case of Cython, you force typing in user code before serving it to the compiler. In PyPy you write good old Python, and the compiler is responsible for any optimization.

Numba : A “specialized just-in-time compiler” that adds JIT to Python's annotated code. Simply put, you give him hints, and he speeds up some parts of your code. Numba is part of the Anaconda distribution suite of packages for data analysis and management.

IPython : very different from everything we discussed. Computing environment for Python. Interactive, with support for GUI packages, browsers and so on.

Psyco : Python expansion module , one of Python's first attempts at the JIT field. It has long been marked as “unsupported and dead” . Psyco's lead developer Armin Rigo is currently working on PyPy .

Language bindings

RubyPython : a bridge between Ruby and Python virtual machines. Allows you to embed Python code in Ruby code. You indicate where Python starts and ends, and RubyPython provides data transfer between virtual machines.
PyObjc : A language connection between Python and Objective-C that behaves like a bridge between them. In practice, this means that you can use Objective-C libraries (including everything you need to create an application for OS X) in Python code, and Python modules in Objective-C code. This is convenient because CPython is written in C, which is a subset of Objective-C.
PyQt : while PyObjc allows you to associate Python with OS X GUI components, PyQt does the same for the Qt framework. This makes it possible to create full-fledged graphical interfaces, access SQL databases, and so on. Another tool aimed at porting Python simplicity to other frameworks.

Javascript frameworks

pajs (Pajamas) : a framework for creating web and desktop applications in Python. Includes a Python-in-JavaScript compiler, a set of widgets and some other tools.
Brython : a Python virtual machine written in Javascript. Allows you to run Py3k code in a web browser.

Tags: