
Python how I would like to see it
- Transfer
Everyone knows that I do not like the third version of Python and the direction in which this programming language is developing. Over the past few months, I have received many letters asking about my vision for the development of Python and decided to share my thoughts with the community in order to give thought to future language developers, if possible.
You can say for sure: Python is not an ideal programming language. In my opinion, the main problems arise from the features of the interpreter and have little to do with the language itself, however, all these nuances of the interpreter gradually become part of the language itself, and therefore they are so important.
I want to start our conversation with one weird interpreter (slots) and end it with the biggest mistake in language architecture. In fact, this series of posts is an investigation of the solutions embedded in the interpreter architecture, and their influence both on the interpreter and on the language itself. I think that from the point of view of the general design of the language, such articles will look much more interesting than just saying thoughts on improving Python.
With this approach, the implicit details of the interpreter implementation directly affect the language architecture, and even force other Python implementations to adopt some things. For example, PyPy does not know anything about slots (as far as I know), but is forced to work as if slots are part of it.
So what is a slot? This is a side effect of the internal implementation of the interpreter. Every Python programmer knows about “magic methods” such as __add__ : these methods begin and end with two underscores, between which their name is enclosed. Every developer knows that if you write a + b in code, then the interpreter will call the function a .__ add __ (b) .
Unfortunately, this is not true.
In fact, Python doesn't work like that. Python internally doesn't work that way (at least in the current version). Here's how the interpreter works:
The operation a + b should be something like type (a) .__ add __ (a, b) , however, as we saw from working with slots, this is not entirely true. You can easily verify this yourself by overriding the metaclass method __getattribute__ , and trying to implement your own __add__ method , you will notice that it will never be called.
In my opinion, the slot system is simply absurd. It is an optimization for working with some types of data (for example, integer), but it does not make absolutely any sense for other objects.
To demonstrate this, I wrote such a meaningless class ( x.py ):
Since we have redefined the __add__ method , the interpreter will put it in the slot. But let's check how fast it is? When we perform operation a + b , we use the slot system, and here are the profiling results:
If we perform the operation a .__ add __ (b) , the slot system will not be used, and instead the interpreter will turn to the dictionary of the class instance (where it will not find anything) and, further, to the dictionary of the class itself, where the desired method will be found. Here's what the measurements look like:
Can you believe it? The option without using slots was faster than the option with slots. Magic? I am not completely sure of the reasons for this behavior, but this has been going on for a long time, a very long time. In fact, the old type classes (which did not have slots) worked much faster than the new type classes and had more features.
More features, you ask? Yes, because old type classes could do this (Python 2.7):
Today, despite having a more complex type system than Python 2, we have fewer options. The code above cannot be executed using classes of a new type. In fact, it’s even worse if you take into account how lightweight the old type classes were:
Here, for example, what integer looked like:
As we can see, even in the very first version of Python, the tp_as_number method already existed . Unfortunately, some older versions of Python (in particular, the interpreter) were lost due to damage to the repository, so let's turn to a little later versions to see how the objects were implemented. This is how the add function code looked in 1993:
So when did the __add__ and other methods come about? I think they appeared in version 1.1. I managed to compile Python 1.1 on OS X 10.9:
Of course, this version is not stable, and not everything works as it should, but you can get an idea of Python of those days. For example, there was a huge difference between implementing objects in C and Python:
We see that then there was no introspection for built-in types such as integer. In fact, the __add__ method was supported exclusively for custom classes:
This is the legacy we got today in Python. The basic principle of Python object architecture has not changed, but for many, many years they have undergone numerous refinements, changes, and refactoring.
However, the difference is still there, and very noticeable. Let's figure it out.
As you know, classes in Python are "open". This means that you can “peek” at them, see the contents that are stored in them, add or remove methods even after the class declaration is completed. But such flexibility is not provided for the built-in interpreter classes. Why is that?
There are no technical limitations to add a new method to, say, a dict object. The reason the interpreter does not allow you to do this has little to do with the sanity of the developer, the fact is that the built-in data types are not located on the heap. To appreciate the global implications of this, you first need to understand how Python runs the interpreter.
Here's what it looks like in pseudo-code:
The problem is that the interpreter has a huge number of global objects, and in fact we have one interpreter. Much better, from an architectural point of view, you had to initialize the interpreter and run it something like this:
This is how other dynamic programming languages work, such as Lua, JavaScript, etc. The key feature is that you can have two interpreters, and this is a new concept.
Who may need multiple interpreters at all? You will be surprised, but even for Python this is necessary, or at least it can be useful. Among existing examples, you can name applications with embedded Python, such as web applications on mod_python, - they definitely need to run in an isolated environment. Yes, there are subinterpreters in Python, but they work inside the main interpreter, and only because so much of everything in Python is tied to the internal state. The largest piece of code for working with Python's internal state is at the same time the most controversial: global interpreter lock (GIL). Python works in the concept of a single interpreter because there is a huge amount of data shared by all subinterpreters. All of them need a lock (lock) for sole access to this data, so this lock is implemented in the interpreter. What kind of data are we talking about?
If you look at the code above, you will see all these huge structures declared as global variables. In fact, the interpreter uses these structures directly in Python code using the macro OB_HEAD_INIT (& Typetype) , which sets the headers necessary for the interpreter to work with these structures. For example, there is a count of the number of links to an object.
Now you see where everything is going? These objects are shared by all subinterpreters. Now imagine that we can change any of these objects in Python code: two completely independent and in no way interconnected Python programs that should not connect anything can affect each other’s state. Imagine, for example, that the JavaScript code in the tab with Facebook can change the implementation of the built-in array object , and in the tab with Google these changes would immediately start working.
This is an architectural solution of 1990, which still continues to influence the modern version of the language.
On the other hand, the immutability of built-in types as a whole was favorably received by the Python developer community, since the problems of mutable data types are well known in other programming languages, and we, frankly, have not lost so much.
However, there is one more thing.
Collections that contain easy-to-use functions are a good example. So, dictionaries in Python have two methods for obtaining an object: __getitem __ () and get (). When you create a class in Python, you usually implement one method through another, returning, for example, return self .__ getitem __ (key) from the get (key) function .
For types implemented in the interpreter, everything is different. The reason, again, is the difference between slots and dictionaries. Say you want to create a dictionary in the interpreter, and one of the conditions is to reuse the existing code, so you want to call __getitem__ from get . What will you do?
The Python method in C is just a function with a specific signature, and this is the first problem. The main objective of the function is to process parameters from Python code and convert them into something that can be used at level C. At a minimum, you need to convert the arguments to the function call from the tuple or Python dictionary (args and kwargs) into local variables. Usually they do this: at first dict__getitem__ just parses the arguments, and then the dict_do_getitem is called with the current parameters. See what is going on? dict__getitem__ and dict_get both call dict_get , which is an internal static function, and you can't do anything about it.
There is no good way around this limitation, and the reason is the slot system. There is no normal way for the interpreter to make a call through vtable, and the reason for this is GIL. The dictionary (dict) communicates with the "outside world" through the API using atomic operations, and this completely loses all meaning when such calls occur via vtable. Why? Yes, because such a call may not reach the Python level, and then it will not be processed through the GIL, and this will immediately lead to huge problems.
Imagine the suffering of overriding in a class inherited from a dictionary the internal dict_get function in which lazy import is run. You throw all your guarantees out of the window. But then again, perhaps we should have done this a long time ago?
I want the internal architecture of its interpreter to be based on independent subinterpreters with local base data types, similar to how it works in JavaScript. This would open up tremendous opportunities for embedding and multithreading based on messaging. Processors will no longer be faster.
Instead of slots and dictionaries in the vtable role, let's just experiment with dictionaries. Objective-C language is completely based on messaging, and this plays a decisive role in its speed: I see that processing calls in Objective-C is much faster than in Python. The fact that strings are an internal type in Python makes comparing them quick. I bet that the proposed approach will be no worse, and even if this slows down the work of internal types, the result should be a much simpler architecture that will be easier to optimize.
You should study the Python source code to see how much extra code is required for the slot system to work, it's just unbelievable! I am convinced that this was a bad idea, and we should have abandoned it a long time ago. Rejecting slots will benefit even PyPy, because I am sure that its authors have to go out of their way to make their interpreter work in compatibility mode with CPython.
Translated Dreadatour , the text read% username%.
You can say for sure: Python is not an ideal programming language. In my opinion, the main problems arise from the features of the interpreter and have little to do with the language itself, however, all these nuances of the interpreter gradually become part of the language itself, and therefore they are so important.
I want to start our conversation with one weird interpreter (slots) and end it with the biggest mistake in language architecture. In fact, this series of posts is an investigation of the solutions embedded in the interpreter architecture, and their influence both on the interpreter and on the language itself. I think that from the point of view of the general design of the language, such articles will look much more interesting than just saying thoughts on improving Python.
Language and implementation
This section was added by me after writing the entire article. In my opinion, some developers overlook the fact of the relationship of Python as a language and CPython as an interpreter, and believe that they are independent of each other. Yes, there is a language specification, but in many cases it either describes the work of the interpreter, or simply keeps silent about some points.With this approach, the implicit details of the interpreter implementation directly affect the language architecture, and even force other Python implementations to adopt some things. For example, PyPy does not know anything about slots (as far as I know), but is forced to work as if slots are part of it.
Slots
In my opinion, one of the biggest problems of the language is the idiotic slot system. I'm not talking about the __slots__ construct , I mean the internal type slots for special methods. These slots are a “feature” of the language, which many overlook, because few people have to deal with it. Moreover, the very existence of slots is the biggest problem of the Python language.So what is a slot? This is a side effect of the internal implementation of the interpreter. Every Python programmer knows about “magic methods” such as __add__ : these methods begin and end with two underscores, between which their name is enclosed. Every developer knows that if you write a + b in code, then the interpreter will call the function a .__ add __ (b) .
Unfortunately, this is not true.
In fact, Python doesn't work like that. Python internally doesn't work that way (at least in the current version). Here's how the interpreter works:
- When an object is created, the interpreter finds all the class descriptors and looks for magic methods such as __add__ .
- For each special method found, the interpreter places a descriptor link in a specially allocated slot of the object, for example, the magic method __add__ is associated with two internal slots: tp_as_number-> nb_add and tp_as_sequence-> sq_concat .
- When the interpreter wants to execute a + b , it will call something like TYPE_OF (a) -> tp_as_number-> nb_add (a, b) (in fact, everything is more complicated there, because the __add__ method has several slots).
The operation a + b should be something like type (a) .__ add __ (a, b) , however, as we saw from working with slots, this is not entirely true. You can easily verify this yourself by overriding the metaclass method __getattribute__ , and trying to implement your own __add__ method , you will notice that it will never be called.
In my opinion, the slot system is simply absurd. It is an optimization for working with some types of data (for example, integer), but it does not make absolutely any sense for other objects.
To demonstrate this, I wrote such a meaningless class ( x.py ):
class A(object):
def __add__(self, other):
return 42
Since we have redefined the __add__ method , the interpreter will put it in the slot. But let's check how fast it is? When we perform operation a + b , we use the slot system, and here are the profiling results:
$ python3 -mtimeit -s 'from x import A; a = A(); b = A()' 'a + b'
1000000 loops, best of 3: 0.256 usec per loop
If we perform the operation a .__ add __ (b) , the slot system will not be used, and instead the interpreter will turn to the dictionary of the class instance (where it will not find anything) and, further, to the dictionary of the class itself, where the desired method will be found. Here's what the measurements look like:
$ python3 -mtimeit -s 'from x import A; a = A(); b = A()' 'a.__add__(b)'
10000000 loops, best of 3: 0.158 usec per loop
Can you believe it? The option without using slots was faster than the option with slots. Magic? I am not completely sure of the reasons for this behavior, but this has been going on for a long time, a very long time. In fact, the old type classes (which did not have slots) worked much faster than the new type classes and had more features.
More features, you ask? Yes, because old type classes could do this (Python 2.7):
>>> original = 42
>>> class FooProxy:
... def __getattr__(self, x):
... return getattr(original, x)
...
>>> proxy = FooProxy()
>>> proxy
42
>>> 1 + proxy
43
>>> proxy + 1
43
Today, despite having a more complex type system than Python 2, we have fewer options. The code above cannot be executed using classes of a new type. In fact, it’s even worse if you take into account how lightweight the old type classes were:
>>> import sys
>>> class OldStyleClass:
... pass
...
>>> class NewStyleClass(object):
... pass
...
>>> sys.getsizeof(OldStyleClass)
104
>>> sys.getsizeof(NewStyleClass)
904
Where did the slot system come from?
All of the above raises the question of where the slots came from. As far as I can tell, it has been a long tradition. When the Python interpreter was originally created, built-in types (for example, strings) were implemented as global static structures, which entailed the need to contain all these special methods that an object should have. This was before the __add__ method itself appeared. If we look at the earliest version of Python in 1990, we will see how objects were implemented at that time.Here, for example, what integer looked like:
static number_methods int_as_number = {
intadd, /*tp_add*/
intsub, /*tp_subtract*/
intmul, /*tp_multiply*/
intdiv, /*tp_divide*/
intrem, /*tp_remainder*/
intpow, /*tp_power*/
intneg, /*tp_negate*/
intpos, /*tp_plus*/
};
typeobject Inttype = {
OB_HEAD_INIT(&Typetype)
0,
"int",
sizeof(intobject),
0,
free, /*tp_dealloc*/
intprint, /*tp_print*/
0, /*tp_getattr*/
0, /*tp_setattr*/
intcompare, /*tp_compare*/
intrepr, /*tp_repr*/
&int_as_number, /*tp_as_number*/
0, /*tp_as_sequence*/
0, /*tp_as_mapping*/
};
As we can see, even in the very first version of Python, the tp_as_number method already existed . Unfortunately, some older versions of Python (in particular, the interpreter) were lost due to damage to the repository, so let's turn to a little later versions to see how the objects were implemented. This is how the add function code looked in 1993:
static object *
add(v, w)
object *v, *w;
{
if (v->ob_type->tp_as_sequence != NULL)
return (*v->ob_type->tp_as_sequence->sq_concat)(v, w);
else if (v->ob_type->tp_as_number != NULL) {
object *x;
if (coerce(&v, &w) != 0)
return NULL;
x = (*v->ob_type->tp_as_number->nb_add)(v, w);
DECREF(v);
DECREF(w);
return x;
}
err_setstr(TypeError, "bad operand type(s) for +");
return NULL;
}
So when did the __add__ and other methods come about? I think they appeared in version 1.1. I managed to compile Python 1.1 on OS X 10.9:
$ ./python -v
Python 1.1 (Aug 16 2014)
Copyright 1991-1994 Stichting Mathematisch Centrum, Amsterdam
Of course, this version is not stable, and not everything works as it should, but you can get an idea of Python of those days. For example, there was a huge difference between implementing objects in C and Python:
$ ./python test.py
Traceback (innermost last):
File "test.py", line 1, in ?
print dir(1 + 1)
TypeError: dir() argument must have __dict__ attribute
We see that then there was no introspection for built-in types such as integer. In fact, the __add__ method was supported exclusively for custom classes:
>>> (1).__add__(2)
Traceback (innermost last):
File "", line 1, in ?
TypeError: attribute-less object
This is the legacy we got today in Python. The basic principle of Python object architecture has not changed, but for many, many years they have undergone numerous refinements, changes, and refactoring.
Modern PyObject
Today, many will argue with the statement that the difference between Python's built-in data types implemented in C and pure Python-implemented objects is not significant. In Python 2.7, this difference is especially evident in the fact that the __repr__ method is provided by the corresponding class class for types implemented in Python, and, accordingly, type for built-in objects implemented in C. This difference, in fact, indicates the placement of the object: statically (for type ) or dynamically in the heap (for class) In practice, this difference did not make any difference, and in Python 3 it completely disappeared. Special methods are placed in slots and vice versa. It would seem that the difference between the Python and C classes is no more.However, the difference is still there, and very noticeable. Let's figure it out.
As you know, classes in Python are "open". This means that you can “peek” at them, see the contents that are stored in them, add or remove methods even after the class declaration is completed. But such flexibility is not provided for the built-in interpreter classes. Why is that?
There are no technical limitations to add a new method to, say, a dict object. The reason the interpreter does not allow you to do this has little to do with the sanity of the developer, the fact is that the built-in data types are not located on the heap. To appreciate the global implications of this, you first need to understand how Python runs the interpreter.
Damn interpreter
Running the interpreter in Python is a very expensive process. When you run the executable file, you activate a complex mechanism that can do a little more than everything. Among other things, built-in data types, module import mechanisms are initialized, some required modules are imported, work is performed with the operating system to configure work with signals and command line parameters, the internal state of the interpreter is set up, etc. And only after the end of all these processes the interpreter starts your code and ends its work. So Python has been working for 25 years now.Here's what it looks like in pseudo-code:
/* вызывается единожды */
bootstrap()
/* эти три строки могут быть вызваны в цикле, если хотите */
initialize()
rv = run_code()
finalize()
/* вызывается единожды */
shutdown()
The problem is that the interpreter has a huge number of global objects, and in fact we have one interpreter. Much better, from an architectural point of view, you had to initialize the interpreter and run it something like this:
interpreter *iptr = make_interpreter();
interpreter_run_code(iptr):
finalize_interpreter(iptr);
This is how other dynamic programming languages work, such as Lua, JavaScript, etc. The key feature is that you can have two interpreters, and this is a new concept.
Who may need multiple interpreters at all? You will be surprised, but even for Python this is necessary, or at least it can be useful. Among existing examples, you can name applications with embedded Python, such as web applications on mod_python, - they definitely need to run in an isolated environment. Yes, there are subinterpreters in Python, but they work inside the main interpreter, and only because so much of everything in Python is tied to the internal state. The largest piece of code for working with Python's internal state is at the same time the most controversial: global interpreter lock (GIL). Python works in the concept of a single interpreter because there is a huge amount of data shared by all subinterpreters. All of them need a lock (lock) for sole access to this data, so this lock is implemented in the interpreter. What kind of data are we talking about?
If you look at the code above, you will see all these huge structures declared as global variables. In fact, the interpreter uses these structures directly in Python code using the macro OB_HEAD_INIT (& Typetype) , which sets the headers necessary for the interpreter to work with these structures. For example, there is a count of the number of links to an object.
Now you see where everything is going? These objects are shared by all subinterpreters. Now imagine that we can change any of these objects in Python code: two completely independent and in no way interconnected Python programs that should not connect anything can affect each other’s state. Imagine, for example, that the JavaScript code in the tab with Facebook can change the implementation of the built-in array object , and in the tab with Google these changes would immediately start working.
This is an architectural solution of 1990, which still continues to influence the modern version of the language.
On the other hand, the immutability of built-in types as a whole was favorably received by the Python developer community, since the problems of mutable data types are well known in other programming languages, and we, frankly, have not lost so much.
However, there is one more thing.
What is VTable?
So, in Python, built-in (implemented in C) data types are practically immutable. How else are they different? Another difference is the “openness” of Python classes. Class methods implemented in the Python programming language are “virtual”: there is no “real” table of virtual methods, as in C ++, and all methods are stored in the dictionary of the class from which the selection is made using the search algorithm. The consequences are obvious: when you inherit from an object and redefine its method, it is likely that another method will be indirectly affected, because it is called in the process.Collections that contain easy-to-use functions are a good example. So, dictionaries in Python have two methods for obtaining an object: __getitem __ () and get (). When you create a class in Python, you usually implement one method through another, returning, for example, return self .__ getitem __ (key) from the get (key) function .
For types implemented in the interpreter, everything is different. The reason, again, is the difference between slots and dictionaries. Say you want to create a dictionary in the interpreter, and one of the conditions is to reuse the existing code, so you want to call __getitem__ from get . What will you do?
The Python method in C is just a function with a specific signature, and this is the first problem. The main objective of the function is to process parameters from Python code and convert them into something that can be used at level C. At a minimum, you need to convert the arguments to the function call from the tuple or Python dictionary (args and kwargs) into local variables. Usually they do this: at first dict__getitem__ just parses the arguments, and then the dict_do_getitem is called with the current parameters. See what is going on? dict__getitem__ and dict_get both call dict_get , which is an internal static function, and you can't do anything about it.
There is no good way around this limitation, and the reason is the slot system. There is no normal way for the interpreter to make a call through vtable, and the reason for this is GIL. The dictionary (dict) communicates with the "outside world" through the API using atomic operations, and this completely loses all meaning when such calls occur via vtable. Why? Yes, because such a call may not reach the Python level, and then it will not be processed through the GIL, and this will immediately lead to huge problems.
Imagine the suffering of overriding in a class inherited from a dictionary the internal dict_get function in which lazy import is run. You throw all your guarantees out of the window. But then again, perhaps we should have done this a long time ago?
Conclusion
In recent years, there has been a clear tendency to complicate the Python language. I would like to see the opposite.I want the internal architecture of its interpreter to be based on independent subinterpreters with local base data types, similar to how it works in JavaScript. This would open up tremendous opportunities for embedding and multithreading based on messaging. Processors will no longer be faster.
Instead of slots and dictionaries in the vtable role, let's just experiment with dictionaries. Objective-C language is completely based on messaging, and this plays a decisive role in its speed: I see that processing calls in Objective-C is much faster than in Python. The fact that strings are an internal type in Python makes comparing them quick. I bet that the proposed approach will be no worse, and even if this slows down the work of internal types, the result should be a much simpler architecture that will be easier to optimize.
You should study the Python source code to see how much extra code is required for the slot system to work, it's just unbelievable! I am convinced that this was a bad idea, and we should have abandoned it a long time ago. Rejecting slots will benefit even PyPy, because I am sure that its authors have to go out of their way to make their interpreter work in compatibility mode with CPython.
Translated Dreadatour , the text read% username%.