Pointers in Python: what's the point?

Original author: Logan Jones
  • Transfer

If you've ever worked with low-level languages ​​like C or C ++, you probably heard about pointers. They allow you to greatly increase the effectiveness of different pieces of code. But they can also confuse novices - and even experienced developers - and lead to memory management bugs. Are there pointers in Python, can I somehow emulate them?

Pointers are widely used in C and C ++. In fact, these are variables that contain the memory addresses at which other variables are located. To brush up on pointers, read this review .

Thanks to this article, you will better understand the object model in Python and find out why pointers do not actually exist in this language. In case you need to simulate the behavior of pointers, you will learn how to emulate them without the accompanying nightmare of memory management.

With this article, you:

  • Learn why Python doesn't have pointers.
  • Learn the difference between C variables and names in Python.
  • Learn to emulate pointers in Python.
  • Use ctypesexperiment with these pointers.

Note : Here, the term “Python” is applied to the Python implementation in C, which is known as CPython. All discussions of the language device are valid for CPython 3.7, but may not correspond to subsequent iterations.

Why aren't there pointers in Python?


I do not know. Can pointers exist in Python natively? Probably, but apparently, the pointers contradict the concept of Zen of Python , because they provoke implicit changes instead of explicit ones. Pointers are often quite complex, especially for beginners. Moreover, they push you to unsuccessful decisions or to do something really dangerous, like reading from a memory area, where you should not have read it.

Python tries to abstract away implementation details from the user, such as a memory address. Often in this language, emphasis is on usability, not speed. Therefore pointers in Python do not make much sense. But don’t worry, by default, the language provides you with some of the benefits of using pointers.

To understand pointers in Python, let's briefly go over the features of the language implementation. In particular, you need to understand:

  1. What are mutable and immutable objects.
  2. How are variables / names arranged in Python.

Hold on to your memory addresses, let's go!

Objects in Python


Everything in Python is an object. For example, open REPL and see how it is used isinstance():

>>> isinstance(1, object)
True
>>> isinstance(list(), object)
True
>>> isinstance(True, object)
True
>>> def foo():
...    pass
...
>>> isinstance(foo, object)
True

This code demonstrates that everything in Python is actually an object. Each object contains at least three types of data:

  • Reference counter.
  • A type.
  • Value.

A reference counter is used to manage memory. Details about this management are written in Memory Management in Python . The type is used at the CPython level to provide type safety during runtime. And value is the actual value associated with the object.

But not all objects are the same. There is one important difference: objects are mutable and immutable. Understanding this distinction between types of objects will help you better understand the first layer of the onion called "pointers in Python."

Mutable and immutable objects


There are two types of objects in Python:

  1. Immutable objects (cannot be changed);
  2. Modifiable objects (subject to change).

Recognizing this difference is the first key to traveling through the world of pointers in Python. Here is a characterization of the immutability of some popular types:

A type
Immutable?
int
Yes
float
Yes
bool
Yes
complex
Yes
tuple
Yes
frozenset
Yes
str
Yes
list
Not
set
Not
dict
Not

As you can see, many of the commonly used primitive types are immutable. You can verify this by writing some Python code. You will need two tools from the standard library:

  1. id() returns the memory address of the object;
  2. isreturns Trueif and only if two objects have the same memory address.

You can run this code in a REPL environment:

>>> x = 5
>>> id(x)
94529957049376

Here we assigned a variable a xvalue 5. If you try to change the value using addition, you will get a new object:

>>> x += 1
>>> x
6
>>> id(x)
94529957049408

Although it might seem that this code just changes the value x, in reality you get a new object as an answer .

Type is stralso immutable:

>>> s = "real_python"
>>> id(s)
140637819584048
>>> s += "_rocks"
>>> s
'real_python_rocks'
>>> id(s)
140637819609424

And in this case, safter the operation, it +=receives a different memory address.

Bonus : The operator is +=converted to various method calls.

For some objects, such as a list, +=converts to __iadd__()(local append). It will change itself and return the same ID. However, strand intno of these methods, and as a result will be called __add__()instead __iadd__().

See the Python data model documentation for more details .

When we try to directly change the string value, swe get an error:

>>> s[0] = "R"

Back trace (the most recent calls are displayed last):

  File "", line 1, in 
TypeError: 'str' object does not support item assignment

The above code crashes and Python says that it strdoes not support this change, which corresponds to the definition of type immutability str.

Compare with a mutable object, for example, with a list:

>>> my_list = [1, 2, 3]
>>> id(my_list)
140637819575368
>>> my_list.append(4)
>>> my_list
[1, 2, 3, 4]
>>> id(my_list)
140637819575368

This code demonstrates the main difference between the two types of objects. Initially, it my_listhas an ID. Even after adding to the list 4, my_listit still has the same ID. The reason is because the type listis mutable.

Here is another demonstration of list mutability using assignment:

>>> my_list[0] = 0
>>> my_list
[0, 2, 3, 4]
>>> id(my_list)
140637819575368

In this code, we changed my_listand set it as the first element 0. However, the list retained the same ID after this operation. The next step on our path to learning Python will be exploring its ecosystem.

We deal with variables


Variables in Python are fundamentally different from variables in C and C ++. Essentially, they just don't exist in Python. Instead of variables, there are names .

It may sound pedantic, and for the most part it is. Most often, you can take names in Python as variables, but you need to understand the difference. This is especially important when you study such a difficult topic as pointers.

To make it easier for you to understand, let's see how variables work in C, what they represent, and then compare with the work of names in Python.

Variables in C


Take the code that defines the variable x:

int x = 2337;

The execution of this short line goes through several different stages:

  1. Allocating enough memory for a number.
  2. Assignment to this place in memory of value 2337.
  3. A display that xindicates this value.

A simplified memory might look like this:



Here a variable xhas a fake address 0x7f1and value 2337. If you later want to change the value x, you can do this:

x = 2338;

This code sets the variable to a xnew value 2338, thereby overwriting the previous value. This means that the variable is xmutable . Updated memory scheme for the new value:



Note that the location has xnot changed, only the value itself. It is important. This tells us that x- this is a place in memory , and not just a name.

You can also consider this issue as part of the concept of ownership. On the one hand, xowns a place in memory. Firstly, it xis an empty box that can contain only one integer, in which integer values ​​can be stored.

When do you assignxsome value, you put the value in the box belonging to x. If you want to introduce a new variable y, you can add this line:

int y = x;

This code creates a new box called yand copies the value from x. Now the memory scheme looks like this:



Pay attention to the new location y- 0x7f5. Although the yvalue was copied to x, the variable yowns a new address in memory. Therefore, you can overwrite the value ywithout affecting x:

y = 2339;

Now the memory scheme looks like this:



I repeat: you changed the value y, but not the location. In addition, you did not affect the original variable in any way x.

With names in Python, the situation is completely different.

Names in Python


There are no variables in Python, names instead. You can use the term “variables” at your discretion, however it is important to know the difference between variables and names.

Let's take the equivalent code from the above C example and write it in Python:

>>> x = 2337

As in C, the code goes through several separate steps during the execution of this:

  1. PyObject is created.
  2. The number for PyObject is assigned a typecode.
  3. 2337 assigned a value for PyObject.
  4. A name is created x.
  5. x points to a new PyObject.
  6. PyObject's reference count is incremented by 1.

Note : PyObject is not the same as an object in Python, this entity is specific to CPython and represents the basic structure of all Python objects.

PyObject is defined as a C-structure, so if you wonder why you cannot directly call typecode or the reference counter, then the reason is that you do not have direct access to the structures. Calling methods like sys.getrefcount () can help get some sort of internal stuff.

If we talk about memory, it can look like this:



Here, the memory circuit is very different from the circuit in C shown above. Instead of xowning a block of memory in which the value is stored 2337, a freshly created Python object owns the memory in which it lives2337. The Python name xdoes not directly own any address in memory, just as a C variable owns a static cell.

If you want to assign a xnew value, try this code:

>>> x = 2338

The behavior of the system will be different from what happens in C, but it will not differ too much from the original bind in Python.

In this code:

  • A new PyObject is created.
  • The number for PyObject is assigned a typecode.
  • 2 assigned a value for PyObject.
  • x points to a new PyObject.
  • The reference count of the new PyObject is incremented by 1.
  • The reference count of the old PyObject is reduced by 1.

Now the memory scheme looks like this:



This illustration demonstrates that it xpoints to a reference to an object and does not own a memory area, as before. You also see that a command x = 2338is not an assignment, but rather a binding xto a link.

In addition, the previous object (containing the value 2337) is now in memory with a reference count of 0, and will be removed by the garbage collector .

You can enter a new name y, as in the C example:

>>> y = x

A new name will appear in memory, but not necessarily a new object:



Now you see that a new Python object has not been created, only a new name has been created that points to the same object. In addition, the object reference counter increased by 1. You can check the equivalence of the identity of objects to confirm their identity:

>>> y is x
True

This code shows that xthey yare one object. But make no mistake: yit is still immutable. For example, you can perform an yaddition operation with :

>>> y += 1
>>> y is x
False

After the addition is called, you will return a new Python object. Now the memory looks like this:



A new object was created, and ynow points to it. It is curious that we would get exactly the same final state if we directly tied yto 2339:

>>> y = 2339

After this expression, we obtain such a final state of memory, as in the addition operation. Let me remind you that in Python you do not assign variables, but bind names to links.

About interns in Python


Now you understand how new objects are created in Python and how names are attached to them. It's time to talk about interned objects.

We have this Python code:

>>> x = 1000
>>> y = 1000
>>> x is y
True

As before, xthey yare names pointing to the same Python object. But this object containing the value 1000cannot always have the same memory address. For example, if you add up two numbers and get 1000, you will get another address:

>>> x = 1000
>>> y = 499 + 501
>>> x is y
False

This time the string x is yreturns False. If you are embarrassed, do not worry. Here's what happens when this code is executed:

  1. A Python object ( 1000) is created.
  2. He is given a name x.
  3. A Python object ( 499) is created.
  4. A Python object ( 501) is created.
  5. These two objects add up.
  6. A new Python object ( 1000) is created.
  7. He is given a name y.

Technical Explanations : The steps described take place only when this code is executed inside the REPL. If you take the above example, insert it into the file and run it, the line x is ywill return True.

The reason is the quick wit of the CPython compiler, which tries to perform peephole optimizations that help to save code execution steps as much as possible. Details can be found in the source code of the peyphole optimizer CPython .

But isn't that wasteful? Well, yes, but you pay this price for all the great benefits of Python. You do not need to think about deleting such intermediate objects, and you do not even need to know about their existence! The joke is that these operations are performed relatively quickly, and you would not know about them until that moment.

The creators of Python wisely noticed this overhead and decided to make several optimizations. Their result is behavior that may surprise beginners:

>>> x = 20
>>> y = 19 + 1
>>> x is y
True

In this example, the code is almost the same as above, except what we get True. It's all about interned objects. Python pre-creates a specific subset of objects in memory and stores them in the global namespace for everyday use.

Which objects depend on the Python implementation? In CPython 3.7, internees are:

  1. Integers ranging from -5to 256.
  2. Strings containing only ASCII letters, numbers, or underscores.

This is because these variables are very often used in many programs. By interning, Python prevents memory allocation for persistent objects.

Lines less than 20 characters in size and containing ASCII letters, numbers or underscores will be interned because they are supposed to be used as identifiers:

>>> s1 = "realpython"
>>> id(s1)
140696485006960
>>> s2 = "realpython"
>>> id(s2)
140696485006960
>>> s1 is s2
True

Here s1they s2point to the same address in memory. If we did not insert an ASCII letter, number or underscore, we would get a different result:

>>> s1 = "Real Python!"
>>> s2 = "Real Python!"
>>> s1 is s2
False

This example uses an exclamation point, so the strings are not interned and are different objects in memory.

Bonus : If you want these objects to refer to the same interned object, you can use it sys.intern(). One way to use this feature is described in the documentation:

String interning is useful for a slight increase in dictionary search performance: if the keys in the dictionary and the key to be searched are interned, then key comparisons (after hashing) can be done by comparing pointers rather than strings. ( Source )

Internees often confuse programmers. Just remember that if you start to doubt, you can always use id()it isto determine the equivalence of objects.

Python Pointer Emulation


The fact that pointers are absent natively in Python does not mean that you cannot take advantage of pointers. There are actually several ways to emulate pointers in Python. Here we look at two of them:

  1. Use as pointers to mutable types.
  2. Using specially prepared Python objects.

Use as mutable type pointers


You already know what mutable types are. It is thanks to their mutability that we can emulate the behavior of pointers. Let's say you need to replicate this code:

void add_one(int *x) {
    *x += 1;
}

This code takes a pointer to a number ( *x) and increments the value by 1. Here is the main function for executing the code:

#include 
int main(void) {
    int y = 2337;
    printf("y = %d\n", y);
    add_one(&y);
    printf("y = %d\n", y);
    return 0;
}

In the given fragment, we assigned a yvalue 2337, displayed the current value on the screen, increased it by 1, and then displayed a new value. The following appears on the screen:

y = 2337
y = 2338

One way to replicate this behavior in Python is to use a mutable type. For example, apply a list and change the first element:

>>> def add_one(x):
...    x[0] += 1
...
>>> y = [2337]
>>> add_one(y)
>>> y[0]
2338

Here, it add_one(x)refers to the first element and increases its value by 1. Using the list means that as a result we get the changed value. So there are pointers in Python? Not. The described behavior became possible because the list is a mutable type. If you try to use a tuple, you get an error:

>>> z = (2337,)
>>> add_one(z)

Back trace (the most recent calls go last):

  File "", line 1, in 
  File "", line 2, in add_one
TypeError: 'tuple' object does not support item assignment

This code demonstrates the immutability of the tuple, so it does not support element assignment.

listnot the only mutable type; part pointers are emulated and using dict.

Suppose you have an application that should track the occurrence of interesting events. This can be done by creating a dictionary and using one of its elements as a counter:

>>> counters = {"func_calls": 0}
>>> def bar():
...    counters["func_calls"] += 1
...
>>> def foo():
...    counters["func_calls"] += 1
...    bar()
...
>>> foo()
>>> counters["func_calls"]
2

In this example, the dictionary uses counters to track the number of function calls. After the call, the foo()counter increased by 2, as expected. And all thanks to mutability dict.

Do not forget, this is just an emulation of pointer behavior, it has nothing to do with real pointers in C and C ++. We can say that these operations are more expensive than if they were performed in C or C ++.

Using Python Objects


dictIs a great way to emulate pointers in Python, but sometimes it is tedious to remember which key name you used. Especially if you use the dictionary in different parts of the application. A custom Python class can help here.

Let's say you need to track metrics in an application. A great way to ignore annoying details is to create a class:

class Metrics(object):
    def __init__(self):
        self._metrics = {
            "func_calls": 0,
            "cat_pictures_served": 0,
        }

This code defines a class Metrics. It still uses the dictionary to store relevant data that lies in the member variable _metrics. This will give you the required mutability. Now you only need to access these values. You can do this using the properties:

class Metrics(object):
    # ...
    @property
    def func_calls(self):
        return self._metrics["func_calls"]
    @property
    def cat_pictures_served(self):
        return self._metrics["cat_pictures_served"]

Here we use @property . If you are new to decorators, read the Primer on Python Decorators article . In this case, the decorator @propertyallows you to access func_callsand cat_pictures_served, as if they were attributes:

>>> metrics = Metrics()
>>> metrics.func_calls
0
>>> metrics.cat_pictures_served
0

The fact that you can refer to these names as attributes means that you are abstracted from the fact that these values ​​are stored in the dictionary. In addition, you make attribute names more explicit. Of course, you should be able to increase the values:

class Metrics(object):
    # ...
    def inc_func_calls(self):
        self._metrics["func_calls"] += 1
    def inc_cat_pics(self):
        self._metrics["cat_pictures_served"] += 1

We introduced two new methods:

  1. inc_func_calls()
  2. inc_cat_pics()

They change the meaning in the dictionary metrics. Now you have a class that can be changed in the same way as a pointer:

>>> metrics = Metrics()
>>> metrics.inc_func_calls()
>>> metrics.inc_func_calls()
>>> metrics.func_calls
2

You can access func_callsand call inc_func_calls()in different parts of applications and emulate pointers in Python. This is useful in situations where you have something like metricsthat you need to use and update frequently in different parts of the applications.

Note : In this case, explicit creation inc_func_calls()and inc_cat_pics()instead of using @property.setterdoes not allow users to set these values ​​to an arbitrary int, or an incorrect value, such as a dictionary.

Here is the complete class source code Metrics:

class Metrics(object):
    def __init__(self):
        self._metrics = {
            "func_calls": 0,
            "cat_pictures_served": 0,
        }
    @property
    def func_calls(self):
        return self._metrics["func_calls"]
    @property
    def cat_pictures_served(self):
        return self._metrics["cat_pictures_served"]
    def inc_func_calls(self):
        self._metrics["func_calls"] += 1
    def inc_cat_pics(self):
        self._metrics["cat_pictures_served"] += 1

Real pointers using ctypes


Maybe there are pointers in Python, especially in CPython? Using the built-in ctypes module, you can create real pointers, as in C. If you are new to ctypes, you can read the article Extending Python With C Libraries and the “ctypes” Module .

You may need this when you need to call the C library, which needs pointers. Let's return to the C-function mentioned above add_one():

void add_one(int *x) {
    *x += 1;
}

Let me remind you that this code increases the value xby 1. To use it, first compile the code into a shared object. We assume that our file is stored in add.c, you can do this with gcc:

$ gcc -c -Wall -Werror -fpic add.c
$ gcc -shared -o libadd1.so add.o

The first command compiles the C source file into an object add.o. The second command takes this unrelated object and creates a common object libadd1.so.

libadd1.soshould be in your current directory. You can use ctypes to load it into Python:

>>> import ctypes
>>> add_lib = ctypes.CDLL("./libadd1.so")
>>> add_lib.add_one
<_FuncPtr object at 0x7f9f3b8852a0>

The ctypes.CDLL code returns an object that represents a shared object libadd1. Since you defined it add_one(), you can access this function as if it were any other Python object. But before calling the function, you need to determine its signature. So Python will know that you are passing the correct type to the function.

In our case, the function signature is a pointer to a number, ctypes will allow you to set this using this code:

>>> add_one = add_lib.add_one
>>> add_one.argtypes = [ctypes.POINTER(ctypes.c_int)]

Here we set the function signature to meet C.'s expectations. Now, if we try to call this code with the wrong type, instead of unpredictable behavior, we get a beautiful warning:

>>> add_one(1)
Traceback (most recent call last):
  File "", line 1, in 
ctypes.ArgumentError: argument 1: : \
expected LP_c_int instance instead of int

Python throws an error and explains that it add_one()wants to get a pointer, not just an integer. Fortunately, there is a way in ctypes to pass pointers to such functions. First, declare a C-style integer:

>>> x = ctypes.c_int()
>>> x
c_int(0)

Here we have created an integer xwith a value 0. ctypes provides a convenient function byref()that allows you to pass a variable by reference.

Note : The phrase by reference is the antonym of passing a variable by value .

When passing by reference, you pass the link to the source variable, so the changes will be reflected on it. When passing by value, you get a copy of the source variable, and changes to this source variable no longer affect it.

You add_one()can use this code to call :

>>> add_one(ctypes.byref(x))
998793640
>>> x
c_int(1)

Excellent! Your number has increased by 1. Congratulations, you have successfully used real pointers in Python.

Conclusion


Now you better understand the relationship between Python objects and pointers. Although some refinements regarding names and variables appear to be manifestations of pedantry, understanding the essence of these key terms improves your understanding of the mechanism for handling variables in Python.

We also learned some ways to emulate pointers in Python:

  • Using mutable objects as pointers with low overhead.
  • Create custom Python objects for ease of use.
  • Unlocking real pointers with the ctypes module.

These methods allow you to emulate pointers in Python without having to sacrifice the memory security provided by the language.

Also popular now: