LinearLeopard June 24, 2019 at 22:22

Integer Type Implementation in CPython

There were already articles on Habré about the implementation details of CPython's memory manager , Pandas , I wrote an article about the implementation of the dictionary .

It would seem that you can write about a regular integer type? However, not everything is so simple and the integer type is not so obvious.

If you are wondering why x * 2 is faster than x << 1 .

And how to crank up the following trick:

>>> 42 == 4
True
>>> 42
4
>>> 1 + 41
4

Then you should read this article.

There are no primitive types in python - all variables are objects and are allocated in dynamic memory. At the same time, integers are represented by only one type ( we do not consider decimal ) - PyLongObject. The implementation and declarations of which lie in the files longobject.h, longintrepr.h and longobject.c.

struct _longobject {
    PyObject_VAR_HEAD   //  стандартный заголовок для объектов переменной длины
    digit ob_digit[1];  //  массив чисел
};

In any object in CPython there are two fields: ob_refcnt - counter of references to the object and ob_type - pointer to the type of the object; to objects that can change their length, the ob_size field is added - the current allocated size (used may be less).

Thus, the integer type is represented by an array of variable length from separate digits, so python out of the box supports long arithmetic, in the second version of the language there was a separate type of “ordinary” integers. “Long” integers were created using the letter L or, if the result of operations with caused overflow by ordinary ones; in the third version it was decided to refuse it.

    # python2
    num1 = 42
    print num1, type(num1)  # 42 
    num2 = 42L
    print num2, type(num2)  # 42 
    num3 = 2 ** 100
    print num3, type(num3)  # 1267650600228229401496703205376

The integer value stored by the _longobject structure is equal to:

$\ sum_ {i = 0} ^ {abs (ob \ _size) -1} {ob \ _digit_i * 2 ^ {SHIFT * i}}$

A 32-bit unsigned type (uint32_t) on 64-bit systems and a 16-bit unsigned short type on 32-bit are used as a bit.

The algorithms used in the implementation impose strict restrictions on the SHIFT from the previous formula, in particular, it should be divided by 15, so now cpython supports two values: 30 and 15, respectively, for 64 and 32-bit systems. For negative values ob_size has a negative value, zero is given by a number for which ob_size = 0.

When performing arithmetic operations, the interpreter checks the current length of the operands, if both of them consist of the same bit, then standard arithmetic is performed.

static PyObject *
long_add(PyLongObject *a, PyLongObject *b)
{
    ...
    // оба операнда - вмещаются в один разряд
    if (Py_ABS(Py_SIZE(a)) <= 1 && Py_ABS(Py_SIZE(b)) <= 1) {
    // суммировать их и выдать результат
        return PyLong_FromLong(MEDIUM_VALUE(a) + MEDIUM_VALUE(b));
    }
    ...
};

Multiplication has a similar structure, in addition, the interpreter implements the Karatsuba algorithm and quick squaring , however, they are not performed for every “long multiplication”, but only for sufficiently large numbers, the number of digits in which are given by two constants:

// в случае неравных операндов
#define KARATSUBA_CUTOFF 70
// в случае возведения в квадрат
#define KARATSUBA_SQUARE_CUTOFF (2 * KARATSUBA_CUTOFF)

static PyObject *
long_mul(PyLongObject *a, PyLongObject *b)
{
    ...
    // оба операнда - вмещаются в один разряд
    if (Py_ABS(Py_SIZE(a)) <= 1 && Py_ABS(Py_SIZE(b)) <= 1) {
    // stwodigits - знаковый тип, вмещающий два разряда
    // int64_t на 64-битных системах и long на 32-битных.
        stwodigits v = (stwodigits)(MEDIUM_VALUE(a)) * MEDIUM_VALUE(b);
        return PyLong_FromLongLong((long long)v);
    }
    ...
    // "Школьное" умножение в столбик O(N^2), если оба операнда достаточно малы.
    i = a == b ? KARATSUBA_SQUARE_CUTOFF : KARATSUBA_CUTOFF;
    if (asize <= i) {
        if (asize == 0)
            return (PyLongObject *)PyLong_FromLong(0);
        else
            return x_mul(a, b);
    }
    ...
};

However, shift commands do not have a check for the case of "short" integers; they are always executed for the case of long arithmetic. Therefore, multiplying by 2 is faster than the shift. It is interesting to note that the division is slower than the right shift, which also does not check the case of single-digit integers. But the division is more complicated: there is one method that first calculates the quotient, then the remainder, NULL is passed to it, if you need to calculate one thing, that is, the method must check this case too, all this increases the overhead.

The comparison function also does not have a special case for “short” integers.

static int
long_compare(PyLongObject *a, PyLongObject *b)
{
    Py_ssize_t sign;
    // случай, когда длины чисел не равны просто определяем
    // в каком больше разрядов с учётом знака
    if (Py_SIZE(a) != Py_SIZE(b)) {
        sign = Py_SIZE(a) - Py_SIZE(b);
    }
    else {
        Py_ssize_t i = Py_ABS(Py_SIZE(a));
        // для "коротких" целых будем просто меньше крутиться в цикле
        while (--i >= 0 && a->ob_digit[i] == b->ob_digit[i])
            ;
        if (i < 0)
            sign = 0;
        else {
            sign = (sdigit)a->ob_digit[i] - (sdigit)b->ob_digit[i];
            if (Py_SIZE(a) < 0)
                sign = -sign;
        }
    }
    return sign < 0 ? -1 : sign > 0 ? 1 : 0;
}

Arrays (lists) of numbers

When creating an integer type variable, the interpreter should allocate sufficient space in dynamic memory, then set the reference counter (type ssize_t), a pointer to type PyLongObject, the current size of the bit array (also ssize_t) and initialize the array itself. For 64-bit systems, the minimum structure size is: 2 * ssize_t + pointer + digit = 2 * 8 + 8 + 4 = 28 bytes. Additional problems appear when creating lists of numbers: since numbers are not a primitive type, and lists in python store references to objects, the objects are not sequentially in dynamic memory.

This placement slows down iteration over the array: in fact, random access is performed, which makes it impossible to predict transitions.

In order to avoid excessive allocation of dynamic memory, which slows down the operating time and contributes to memory fragmentation, optimization is implemented in cpython: at the start-up phase, an array of small integers is pre-allocated: from -5 to 256.

#ifndef NSMALLPOSINTS
#define NSMALLPOSINTS           257  // Как и положено в python правая граница не входит.
#endif
#ifndef NSMALLNEGINTS
#define NSMALLNEGINTS           5
#endif

As a result, the list of numbers in cpython appears in memory as follows:

There is the possibility of reaching the pre-selected list of small integers from the script, armed with this article and the standard ctypes module:

Disclaimer: The following code is supplied as is, the author does not bear any responsibility and cannot guarantee the state of the interpreter, as well as the mental health of you and your colleagues, after running this code.

import ctypes
# структура PyLongObject
class IntStruct(ctypes.Structure):
    # объявление полей
    _fields_ = [("ob_refcnt", ctypes.c_long),
                ("ob_type", ctypes.c_void_p),
                ("ob_size", ctypes.c_long),
                ("ob_digit", ctypes.c_int)]
    def __repr__(self):
        return ("IntStruct(ob_digit={self.ob_digit}, ob_size={self.ob_size}, "
                "refcount={self.ob_refcnt})").format(self=self)
if __name__ == '__main__':
    # получаем адрес числа 42
    int42 = IntStruct.from_address(id(42))
    print(int42)
    int_minus_2 = IntStruct.from_address(id(-2))
    print(int_minus_2)  # ob_digit=2, ob_size=-1
    # меняем значение в списке предвыделенных чисел
    int42.ob_digit = 4
    print(4 == 42)  # True
    print(1 + 41)   # 4

Honest PyLongObject is stored in this list, that is, they have a reference counter, for example, you can find out how many zeros your script and the interpreter use.

From the internal implementation of integers it follows that arithmetic in cpython cannot be fast, where other languages sequentially iterate over the array, reading numbers into registers and directly calling several processor instructions, cpython is tagged throughout the memory, performing rather complicated methods. In the simplest case of single-bit integers, in which it would be enough to call one instruction, the interpreter must compare the sizes, then create an object in dynamic memory, fill in the service fields, and return a pointer to it; moreover, these actions require a GIL lock. Now you understand why the numpy library came about and why it is so popular.

I would like to finish the article about the whole type in cpython, all of a sudden, with information about the boolean type.

The fact is that for a long time in python there was no separate type for Boolean variables, all logical functions returned 0 and 1. However, they decided to introduce a new type . At the same time, it was implemented as a child type to the whole.

PyTypeObject PyBool_Type = {
    PyVarObject_HEAD_INIT(&PyType_Type, 0)  // длина булевого типа меняться не может
                                            // однако, так как он наследуется от целого
                                            // там уже присутствует поле ob_size
    "bool",                                 // имя типа
    sizeof(struct _longobject),             // размер в памяти
    ...
    &PyLong_Type,                           // базовый тип
    ...
    bool_new,                               // конструктор
};

Moreover, each of the values of the Boolean type is a singleton, a Boolean variable is a pointer to an instance of True or False (None is implemented in a similar way).

struct _longobject _Py_FalseStruct = {
    PyVarObject_HEAD_INIT(&PyBool_Type, 0)
    { 0 }
};
struct _longobject _Py_TrueStruct = {
    PyVarObject_HEAD_INIT(&PyBool_Type, 1)
    { 1 }
};
static PyObject *
bool_new(PyTypeObject *type, PyObject *args, PyObject *kwds)
{
    PyObject *x = Py_False;
    long ok;
    if (!_PyArg_NoKeywords("bool", kwds))
        return NULL;
    if (!PyArg_UnpackTuple(args, "bool", 0, 1, &x))
        return NULL;
    ok = PyObject_IsTrue(x);
    if (ok < 0)
        return NULL;
    return PyBool_FromLong(ok);
}
PyObject *PyBool_FromLong(long ok)
{
    PyObject *result;
    if (ok)
        result = Py_True;
    else
        result = Py_False;
    Py_INCREF(result);
    return result;
}

PyObject_IsTrue (x) is a tricky mechanism for calculating a Boolean value, you can see it here in the section on the bool function or in the documentation .

Such a legacy leads to some funny effects, such as the exact equality of True and 1, the inability to have True and 1 as keys in the dictionary and the set, and the admissibility of arithmetic operations on the Boolean type:

>>> True == 1
True
>>> {True: "true", 1: "one"}
{True: 'one'}
>>> True * 2 + False / (5 * True) - (True << 3)
-6.0
>>> False < -1
False

The python language is magnificent for its flexibility and readability, however, it must be borne in mind that all this comes at a price, for example, in the form of superfluous abstractions and overhead, which we often do not think about or guess about. I hope this article has allowed you to dispel the “fog of war” a little over the source code of the interpreter, perhaps even induce you to study it. The interpreter code is very easy to read, almost like the code on python itself, and studying it will allow you to not only learn how the interpreter is implemented, but also interesting algorithmic and system solutions, as well as possibly write more efficient code or become a developer of cpython itself.

Tags:

Integer Type Implementation in CPython

Arrays (lists) of numbers

Also popular now: