Unladen Swallow - everything ...
- Transfer
From a translator: A couple of hours ago, Guido on his twitter mentioned a blog post by his colleague, one of the (former) Unladen Swallow developers, in which he tells the sad story of Unladen Swallow's vibrant but short life on Google.
Original: Reid Kleckner - Unladen Swallow Retrospective
I started writing this while I was at PyCon, but has been updating the text a bit since then. Be that as it may, here. :)
As it is now obvious, no one else seriously deals with either Unladen Swallow directly, or porting it to py3k. Why?
Loss of customer interest
The main reason is that Google did not find enough potential users for Unladen Swallow. There are several reasons for this:
After my internship ended, I tried to make Unladen the topic of my master's work at MIT, but my manager considered that the results achieved at that time did not promise good prospects, and that the concepts that I wanted to apply were no longer considered modern. The main methods (from the translator: referring to methods for generating optimized code based on tracking the process of non-optimized execution) feedback was already implemented by Urs Hölzle for Smalltalk, and tracing methods were implemented by Andreas Gil (from comments: actually, Andreas Gal) for java. Of course, this does not mean that no one else can invent new methods, but at that time I had no fresh ideas.
Loss of self interest
Basically, all of the above was considered in the first quarter of 2010. We could still decide to work on Unladen in our free time, but that would be different.
Firstly, working on a project alone is far from being as fun as with other people, especially if it is not obvious whether your creation will ever have users.
Secondly, one of the main reasons for our interest was that we thought that PyPy would never even try to support C extensions or SWIG wrapped code. We were very surprised when we learned that the PyPy project began to move in this direction. This partially eased the need for a plug-in JIT for CPython. In addition, when the project was launched, PyPy did not yet support the 64-bit platform, but over time they added this support.
And, in the end, the comments in python-dev that we read did not reassure us. People assumed that if Unladen Swallow gets into py3k, then this code will be supported by Google, but these assumptions were already groundless. If the code was frozen, it would seem that by default the JIT would be turned off, and after a year,due to its rare use and lack of support , it would have to be removed again. Very few developers were interested in the new JIT. We never dominated the code, but we hoped that if we were done, we could inspire CPython developers to pick it up.
So, taking into account all of the above reasons why none of us no longer deals with Unladen, what do we understand?
Conclusions about LLVM
First, we deeply studied the pros and cons of using LLVM to generate JIT code. Our initial decision to use LLVM was made due to the fact that none of us knew the x86 assembler deep enough, but at the same time we wanted to support x86, x86_64, and maybe even ARM. We considered the option to recycle psyco, but refused mainly due to the need for in-depth knowledge of x86.
Unfortunately, at the moment LLVM is too focused on using as a static optimizer and backend. LLVM code generation and optimization are good, but too expensive to use. The optimizations are too oriented to work with intermediate code representation generated by static C-like languages. Most basic Python optimization techniques require a high-level view of how the program worked in previous iterations, which was difficult with LLVM.
An example of how high-level representation is applied in code generation is optimization of work with the Python stack. LLVM is not able to optimize reading from the Python stack between calls to external functions (i.e., the CPython environment, which means it never actually does). To solve this problem, we finally had to write an analysis of aliases, and this is all a typical example of what you encounter if you do not create your own code generator.
In addition, other restrictions apply to LLVM. For example, LLVM does not seriously support back-patching (from a translator: apparently, dynamic code modification)which PyPy uses to edit on-the-fly exits from validation branches of running code. This is a fairly important requirement with significant memory consumption, but I would argue with the latter, because according to the results of Steven Noonan’s work within the GSOC, consumption can be reduced, especially considering that PyPy’s memory consumption was higher.
In addition, I spent the summer creating an interface between LLVM JIT and gdb. This was not necessary, but the result was a useful tool. I don’t know what is being used to debug PyPy, but we can use our experience and apply it to PyPy.
results
Personally, even before starting work on this project, I took a course on compilers and operating systems, but the experience I gained while working brought me a huge amount of new skills. I am now very well versed in gdb, I ruled it for myself and even debugged gdb with gdb. Now I know a lot more about x86, compiler optimization techniques, JIT features, and I use this in my master's work.
I am also very proud of our macro-benchmark set of real Python applications, which is now actively used by the PyPy project: speed.pypy.org . Of all my Python performance-related activities, it has proven to be the most useful. Any change in performance is easily noticed by changes in the results before and after its appearance.
We also brought some benefit to LLVM, thereby helping other LLVM-based JIT projects such as Parrot and Rubinius. For example, I helped fix the 16-megabyte limit on the code being analyzed by the JIT. Again, there is now a gdb interface for LLVM JIT. Jeff also spent a lot of time so that C-functions could be compiled directly into the generated JIT code, as well as fixing memory leaks and adding a TypeBuilder template to build C-types in LLVM.
Therefore, no matter how I wish there were more resources and the project survived, all this brought me great experience, we managed to bring some benefit to LLVM and create a very useful set of benchmarks.
Original: Reid Kleckner - Unladen Swallow Retrospective
Unladen Swallow: retrospective
I started writing this while I was at PyCon, but has been updating the text a bit since then. Be that as it may, here. :)
As it is now obvious, no one else seriously deals with either Unladen Swallow directly, or porting it to py3k. Why?
Loss of customer interest
The main reason is that Google did not find enough potential users for Unladen Swallow. There are several reasons for this:
- For the bulk of Google’s Python code, high performance is not required. Python is mainly used for tools and prototyping, while the main end-user applications are written in Java and C ++.
- Those internal users to whom performance was important were confused by the too complex requirements for deploying Unladen Swallow. For them, it was not just required that Unladen Swallow be a replacement for CPython - any new Python implementation should replace the existing “turnkey” one. We thought that by basing the source code on CPython, and not starting development from absolute zero, we would avoid this problem, because all extensions to C and SWIG wrappers will continue to work as if nothing had happened. However, even upgrading from a previous version of CPython to Python 2.6 was too complicated.
- One way or another, our potential users found other ways to solve their performance problems, which turned out to be more convenient for them to deploy.
After my internship ended, I tried to make Unladen the topic of my master's work at MIT, but my manager considered that the results achieved at that time did not promise good prospects, and that the concepts that I wanted to apply were no longer considered modern. The main methods (from the translator: referring to methods for generating optimized code based on tracking the process of non-optimized execution) feedback was already implemented by Urs Hölzle for Smalltalk, and tracing methods were implemented by Andreas Gil (from comments: actually, Andreas Gal) for java. Of course, this does not mean that no one else can invent new methods, but at that time I had no fresh ideas.
Loss of self interest
Basically, all of the above was considered in the first quarter of 2010. We could still decide to work on Unladen in our free time, but that would be different.
Firstly, working on a project alone is far from being as fun as with other people, especially if it is not obvious whether your creation will ever have users.
Secondly, one of the main reasons for our interest was that we thought that PyPy would never even try to support C extensions or SWIG wrapped code. We were very surprised when we learned that the PyPy project began to move in this direction. This partially eased the need for a plug-in JIT for CPython. In addition, when the project was launched, PyPy did not yet support the 64-bit platform, but over time they added this support.
And, in the end, the comments in python-dev that we read did not reassure us. People assumed that if Unladen Swallow gets into py3k, then this code will be supported by Google, but these assumptions were already groundless. If the code was frozen, it would seem that by default the JIT would be turned off, and after a year,due to its rare use and lack of support , it would have to be removed again. Very few developers were interested in the new JIT. We never dominated the code, but we hoped that if we were done, we could inspire CPython developers to pick it up.
So, taking into account all of the above reasons why none of us no longer deals with Unladen, what do we understand?
Conclusions about LLVM
First, we deeply studied the pros and cons of using LLVM to generate JIT code. Our initial decision to use LLVM was made due to the fact that none of us knew the x86 assembler deep enough, but at the same time we wanted to support x86, x86_64, and maybe even ARM. We considered the option to recycle psyco, but refused mainly due to the need for in-depth knowledge of x86.
Unfortunately, at the moment LLVM is too focused on using as a static optimizer and backend. LLVM code generation and optimization are good, but too expensive to use. The optimizations are too oriented to work with intermediate code representation generated by static C-like languages. Most basic Python optimization techniques require a high-level view of how the program worked in previous iterations, which was difficult with LLVM.
An example of how high-level representation is applied in code generation is optimization of work with the Python stack. LLVM is not able to optimize reading from the Python stack between calls to external functions (i.e., the CPython environment, which means it never actually does). To solve this problem, we finally had to write an analysis of aliases, and this is all a typical example of what you encounter if you do not create your own code generator.
In addition, other restrictions apply to LLVM. For example, LLVM does not seriously support back-patching (from a translator: apparently, dynamic code modification)which PyPy uses to edit on-the-fly exits from validation branches of running code. This is a fairly important requirement with significant memory consumption, but I would argue with the latter, because according to the results of Steven Noonan’s work within the GSOC, consumption can be reduced, especially considering that PyPy’s memory consumption was higher.
In addition, I spent the summer creating an interface between LLVM JIT and gdb. This was not necessary, but the result was a useful tool. I don’t know what is being used to debug PyPy, but we can use our experience and apply it to PyPy.
results
Personally, even before starting work on this project, I took a course on compilers and operating systems, but the experience I gained while working brought me a huge amount of new skills. I am now very well versed in gdb, I ruled it for myself and even debugged gdb with gdb. Now I know a lot more about x86, compiler optimization techniques, JIT features, and I use this in my master's work.
I am also very proud of our macro-benchmark set of real Python applications, which is now actively used by the PyPy project: speed.pypy.org . Of all my Python performance-related activities, it has proven to be the most useful. Any change in performance is easily noticed by changes in the results before and after its appearance.
We also brought some benefit to LLVM, thereby helping other LLVM-based JIT projects such as Parrot and Rubinius. For example, I helped fix the 16-megabyte limit on the code being analyzed by the JIT. Again, there is now a gdb interface for LLVM JIT. Jeff also spent a lot of time so that C-functions could be compiled directly into the generated JIT code, as well as fixing memory leaks and adding a TypeBuilder template to build C-types in LLVM.
Therefore, no matter how I wish there were more resources and the project survived, all this brought me great experience, we managed to bring some benefit to LLVM and create a very useful set of benchmarks.