
Memory and numbers in Python

I used to work with C-like languages, but now I had to sit down with Python. The syntax was easy, and it was a turn of tricky questions. Under cat - an article about how Python implements data storage in memory. I do not pretend to be true, but try to figure it out.
We look at the links
Let's start with the simplest. Any data in Python is an object, any variable is a reference to an object. There is no data that is not an object. To begin with, we need to learn how to determine whether two “identical” objects are one and the same. To do this, you need to get the address, which easily allows you to make the built-in id () function. We try:
print(id(0))
As expected, something unintelligible is output. A large number is probably really an address. But if each number used throughout is stored in memory, then no memory will naturally suffice. A short experiment is being conducted:
print(id(0))
print(id(0))
Two absolutely identical numbers. Therefore, all constant numbers are actually stored in memory without duplication. It is logical - Python has already low productivity, such a trick allows you to save its last remains. Okay, let's try to fill all the memory with a huge array of zeros.
a = [0]
while True:
a += [0]
The endless cycle, as expected, runs endlessly, but requires virtually no memory. Another experiment:
a = [0, 0]
print(id(a[0]))
print(id(a[1]))
Well, yes, the same number. Rather, to confirm, I am conducting the same test with two different variables - the same number, and even equal id (0). That is, the algorithm, apparently, is this: when our variable value changes, we check to see if it is the same in the memory, and, if so, redirect the link to it. This behavior is required, obviously, because an object takes up quite a bit of memory, and to be more compact, Python makes the most of existing objects. In order not to clutter up the article with code, I will say that for strings (including those obtained through a slice), logical objects, and even arrays, this works the same way. Let's make a second attempt to take all the memory in Python:
i = 0
a = [0]
while True:
a += [a[i]]
i += 1
Success! Memory consumption is constantly increasing. We draw the first conclusion:
1. Any data in Python is an object.
2. If the objects are “the same”, then they are stored at the same address in memory. In other words, a == b and id (a) == id (b) are equivalent statements.
3. No more complicated optimization is used - a rather simple dependency in the array is no longer optimized in any way (only the rule “a [i] = i”). However, I would be surprised if I were to use it: a rather complicated lexical analysis is required here, which Python, with its step-by-step interpretation, cannot afford.
Counting links
Disclaimer: We will now work in Python interactive mode. In order to count references to an object there is a sys.getrefcount () function. Import sys:
>>> from os import sys
And to begin with, we need to determine how real the data that it gives out is:
>>> sys.getrefcount('There is no this string in Python')
3
>>> sys.getrefcount('9695c3716e3b801367b7eca6a3281ac9') #md5-хеш 512 рандомных байт из /dev/urandom.
3
>>> a = 'More random for the random god!'
>>> sys.getrefcount(a)
2
>>> a = 0
>>> sys.getrefcount(a)
434
>>> sys.getrefcount(0)
436
This tells us one funny thing: counting links, getrefcount () creates them themselves. As we see, for constants he creates two of them (really two, I tried on large volumes of input data, which I do not publish here as unnecessary), so that we can just subtract 2. In fact, apparently for the variables it also creates two , but does not take into account the variable itself. Well, we figured out the deviations of the results from reality. Now a few examples:
>>> sys.getrefcount(1)
754
>>> sys.getrefcount(65)
13
>>> sys.getrefcount(67)
11
>>> sys.getrefcount('A')
4
>>> sys.getrefcount('a')
6
>>> sys.getrefcount(False)
100
>>> sys.getrefcount(True)
101
# Истина победила!
Why do pointers suddenly appear out of the blue (an integer 751 pieces per unit)? Because this function counts C pointers, that is, it includes those that are used in the Python code itself. In fact, we are brazenly breaking into that part of Python that developers are trying to hide from us.
Well, here is such a backstage in Python. If my hands reach and I can write about what will happen if you try to change these objects manually through OllyDbg, say.