Optimizations used in Python: list and tuple

Transfer

In Python, there are two similar types - a list (list) and a tuple (tuple). The most famous difference between the two is that tuples are immutable.

You cannot change objects in tuple:

>>> a = (1,2,3)
>>> a[0] = 10
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment

But you can modify mutable objects inside the tuple:

>>> b = (1,[1,2,3],3)
>>> b[1]
[1, 2, 3]
>>> b[1].append(4)
>>> b
(1, [1, 2, 3, 4], 3)

Inside CPython (the standard interpreter), the list and the tuple are implemented as a sheet of pointers (links) to Python objects, i.e. physically they do not store objects next to each other. When you delete an object from the list, the link to this object is removed. If someone else refers to the object, then it will continue to be in memory.

Tuples

Despite the fact that tuples are much less common in code and not so popular, this is a very fundamental type that Python constantly uses for internal purposes.

You may not notice, but you use tuples when:

work with arguments or parameters (they are stored as tuples)
return two or more variables from a function
iterating the key values in the dictionary
use string formatting

As a rule, the simplest script uses thousands of tuples:

>>> import gc
>>> deftype_stats(type_obj):...     count = 0... for obj in gc.get_objects():
... if type(obj) == type_obj:
...             count += 1... return count
...
>>> type_stats(tuple)
3136>>> type_stats(list)
659>>> import pandas
>>> type_stats(tuple)
6953>>> type_stats(list)
2455

Empty lists vs empty tuples

An empty tuple works as a singleton, i.e. there is always only one empty tuple in the memory of a running Python script. All empty tuples simply refer to the same object, this is possible due to the fact that tuples are immutable. This approach saves a lot of memory and speeds up the process of working with empty tuples.

>>> a = ()
>>> b = ()
>>> a is b
True>>> id(a)
4409020488>>> id(b)
4409020488>>> # В CPython, функция id возвращает адрес в памяти.

But this does not work with lists, because they can be changed:

>>> a = []
>>> b = []
>>> a is b
False>>> id(a)
4465566920>>> id(b)
4465370632

Optimize memory allocation for tuples

In order to reduce memory fragmentation and speed up the creation of tuples, Python re-uses old tuples that have been deleted. If a tuple consists of less than 20 elements and is no longer used, instead of deleting, Python places it in a special list in which tuples that are available for reuse are stored.

This list is divided into 20 groups, where each group is a list of tuples of size n, where n is from 0 to 20. Each group can store up to 2,000 free tuples. The first group stores only one element and is a list of one empty tuple.

>>> a = (1,2,3)
>>> id(a)
4427578104>>> del a
>>> b = (1,2,4)
>>> id(b)
4427578104

In the example above, we can see that a and b have the same address in memory. This is due to the fact that we instantly occupied a free tuple of the same size.

Optimize memory allocation for lists

Since the lists can be changed, the same optimization as in the case of tuples cannot be done anymore. Despite this, similar optimization is used for lists aimed at empty lists. If an empty list is deleted, it can also be reused later.

>>> a = []
>>> id(a)
4465566792>>> del a
>>> b = []
>>> id(b)
4465566792

Resizing a list

To avoid the overhead of constantly resizing lists, Python does not change its size every time it is required. Instead, each list has a set of additional cells that are hidden to the user, but can later be used for new items. As soon as the hidden cells end, Python adds extra space for new elements. And it does this with a good margin, the number of hidden cells is selected based on the current list size - the larger it is, the more additional hidden slots for new items.

This optimization especially helps when you try to add a lot of elements in the loop.

The list size growth pattern looks like this: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...

For example, if you want to add a new element to the list with 8 elements, then there will be no free cells in it and Python will immediately expand its size to 16 cells, where 9 of them will be occupied and visible to the user.

The size selection formula written in Python:

>>> defget_new_size(n_items):...     new_size = n_items + (n_items // 2 ** 3)
... if n_items < 9:
...             new_size += 3... else:
...             new_size += 6
...
... return new_size
...
>>> get_new_size(9)
16

Speed

If we compare these two types of speed, then on average in the hospital, the tuples are slightly faster than lists. Raymond Hettinger has a great explanation for the speed difference on stackoverflow .

PS: I am the author of this article, you can ask any questions.

Tags: