The story of one experiment with Cython and C ++ vector

One ~~warm~~ on a cold winter evening, I wanted to warm up in the office and test the theory of one colleague that the C ++ vector could cope with the task faster than the CPython list.

In the company we are developing products based on Django and it so happened that it was necessary to process one large array of dictionaries. A colleague suggested that the implementation in C ++ would be much faster, but I had the feeling that Guido and the community were probably a little steeper than us in C, and perhaps already decided and avoided all the pitfalls, having implemented everything much faster.

To test the theory, I decided to write a small test file, in which I decided to loop through the insertion of 1M dictionaries of the same content into an array and vector 100 times in a row.

The results, though expected, were also sudden.

It so happened that we actively use Cython, so in general, the results will differ in fully CPython implementation.

Stand

Calculate Linux onegreyonewhite 4.18.14-calculate # 1 SMP PREEMPT Sat Oct 13 21:03:27 UTC 2018 x86_64 Intel® Core (TM) i7-4770 CPU @ 3.40GHz GenuineIntel GNU / Linux
Python 2.7 and 3.6
Cython 0.28.3
gcc (Gentoo 7.3.0-r3 p1.4)

Script

By the way, I had to tinker here. To get the most realistic numbers (i.e., not just to be super-optimized, but also so that we can use it without dancing with a tambourine), we had to do everything in the main script, and reduce all additional .h to a minimum.

The first problem was that the Cython wrapper for the vector does not want to work in this form:

# Так не хотел
ctypedef vector[object] dict_vec
# И так не завелось (ошибка появлялась на vector.push_back(dict()))
ctypedef vector[PyObject*] dict_vec
# И даже так, что удивительно (просто говорит, что не может object привести к PyObject.)
ctypedef vector[PyObject] dict_vec

With all this, they got an error that it is impossible to cast dict to PyObject. Of course, these are Cython problems, but since we use it, we need to solve this particular problem.
I had to make a little crutch in the form

#include"Python.h"static PyObject * convert_to_pyobject(PyObject *obj){
    return obj;
}

The most amazing thing is that it worked. What scares me the most is that I do not fully understand why and what the consequences are.

Final Sources

cython_experiments.h

#include"Python.h"static PyObject * convert_to_pyobject(PyObject *obj){
    return obj;
}

cython_experiments.pyx

# -*- coding: utf-8 -*-# distutils: language = c++# distutils: include=['./']# distutils: extra_compile_args=["-O1"]from __future__ import unicode_literals
import time
from libc.stdlib cimport free
from cpython.dict cimport PyDict_New, PyDict_SetItemString
from cpython.ref cimport PyObject
from libcpp.string cimport string
from libcpp.vector cimport vector
cdef extern from"cython_experiments.h":
    PyObject* convert_to_pyobject(object obj)
ctypedef vector[PyObject*] dict_vec
range_attempts = 10 ** 6# Insert time
cdef test_list():
    t_start = time.time()
    data_list = list()
    for i from0 <= i < range_attempts:
        data_list.append(dict(
            name = 'test_{}'.format(i),
            test_data = i,
            test_data2 = str(i),
            test_data3 = range(10),
        ))
    del data_list
    return time.time() - t_start
cdef test_vector():
    t_start = time.time()
    cdef dict_vec *data_list
    data_list = new dict_vec()
    data_list.resize(range_attempts)
    for i from0 <= i < range_attempts:
        data = PyDict_New()
        PyDict_SetItemString(data, 'name', 'test_{}'.format(i))
        PyDict_SetItemString(data, 'test_data', i)
        PyDict_SetItemString(data, 'test_data2', str(i))
        PyDict_SetItemString(data, 'test_data3', range(10))
        data_list.push_back(convert_to_pyobject(data))
    free(data_list)
    return time.time() - t_start
# Get statistic
times = dict(list=[], vector=[])
attempts = 100for i from0 <= i < attempts:
    times['list'].append(test_list())
    times['vector'].append(test_vector())
    print('''
    Attempt: {}
    List time: {}
    Vector time: {}
    '''.format(i, times['list'][-1], times['vector'][-1]))
avg_list = sum(times['list']) / attempts
avg_vector = sum(times['vector']) / attempts
print('''
Statistics:
attempts: {}
list avg time: {} 
vector avg time: {} 
'''.format(attempts, avg_list, avg_vector))

Attempt 1

I really want to be able to collect * .whl for the project and that it all started on almost any system, so the optimization flag was first set to 0. This led to a strange result:

Python 2.7
Statistics:
attempts: 100
list avg time: 2.61709237576
vector avg time: 2.92562381506

After some thought, I decided that we still use the -O1 flag, so I set it up and got it:

Python 2.7
Statistics:
attempts: 100
list avg time: 2.49274396896
vector avg time: 0.922211170197

Once I got a little excited: all the same, the belief in the professionalism of Guido and Co. let me down. But then, I noticed that the script is somehow suspiciously eating the memory, and by the end it was eating up about 20GB of RAM. The problem was as follows: in the final script, you can observe the function free, after passing the cycle. At this iteration it was not yet. Then I thought ...

And if I do not disable gc?

Between attempts I did gc.disable () and after trying gc.enable () . I run the build and script and get:

Python 2.7
Statistics:
attempts: 100
list avg time: 1.00309731514
vector avg time: 0.941153049469

In general, the difference is not big, so I thought that there is no point ~~overpay~~try to somehow pervert and just use CPython, but still collect it with Cython.
Probably many had a question: "What about memory?" The most amazing (no) is that nothing. She grew at the same rate and in the same amount. An article came to mind , but Python did not want to go into the sources. And this meant only one thing - the problem in the implementation of the vector.

The final

After a lot of torment with type conversion, namely, that the vector takes a pointer to the dictionary, that final script was obtained and with gc turned on, I received an average of 2.6 times the difference (vector faster) and relatively good work with memory.

Suddenly it dawned on me that I collect everything only under Py2.7 and did not even try to do anything with 3.6.

And here I was really surprised (after previous results, the surprise was natural):

Python 3.6
Statistics:
attempts: 100
list avg time: 0.8771139788627624
vector avg time: 1.075702157020569
Python 2.7
Statistics:
attempts: 100
list avg time: 2.61709237576
vector avg time: 0.92562381506

With all this, gc still worked, the memory was not erased and it was the same script. Understanding that after a little more than a year, I would have to say goodbye to 2.7, I still didn’t give rest that there was such a difference between them. Most often, I heard / read / experimented and Py3.6 was slower than Py2.7. However, the guys from Cython-developers did something incredible and changed the situation at the root.

Total

After this experiment, we decided not to bother much with the support of Python 2.7 and the alteration of any parts of the applications in C ++, simply because it is not worth it. Everything has already been written to us, we can only use this correctly to solve a specific problem.

UPD 12/24/2018 :
Following the advice of iCpu, and after attacks to the side, what is being checked is not to understand what and how, I tried to rewrite the C ++ part as convenient as possible for further development, as well as minimize abstractions. It turned out even worse:

The result of poor knowledge of C ++

cython_experiments.h

#include"Python.h"#include<vector>#include<algorithm>#ifndef PyString_AsString#define PyString_AsString PyUnicode_AsUTF8#define PyString_FromString PyUnicode_FromString#endiftypedefstruct {char* name;
    bool reverse;
} sortFiled;
classcmpclass {public:
        cmpclass(std::vector<char*> fields) {
            for (std::vector<char*>::iterator it = fields.begin() ; it < fields.end(); it++){
                bool is_reverse = false;
                char* name;
                if (it[0] == "-"){
                    is_reverse = true;
                    for(int i=1; i<strlen(*it); ++i)
                        name[i] = *it[i];
                }
                else {
                    name = *it;
                }
                sortFiled field = {name, is_reverse};
                this->fields_to_cmp.push_back(field);
            }
        }
        ~cmpclass() {
            this->fields_to_cmp.clear();
            this->fields_to_cmp.shrink_to_fit();
        }
        booloperator()(PyObject* left, PyObject* right){
            //bool result = false;
            for (std::vector<sortFiled>::iterator it = this->fields_to_cmp.begin() ; it < this->fields_to_cmp.end(); it++){
                //
                PyObject* str_name = PyString_FromString(it->name);
                PyObject* right_value = PyDict_GetItem(right, str_name);
                PyObject* left_value = PyDict_GetItem(left, str_name);
                if(!it->reverse){
                    result = left_value < right_value;
                } else {
                    result = (left_value > right_value);
                }
                PyObject_Free(str_name);
                if(!result)
                    returnfalse;
            }
            returntrue;
        }
    private:
        std::vector<sortFiled> fields_to_cmp;
};
voidvector_multikeysort(std::vector<PyObject *> items, PyObject* columns, bool reverse){
    std::vector<char *> _columns;
    for (int i=0; i<PyList_GET_SIZE(columns); ++i) {
        PyObject* item = PyList_GetItem(columns, i);
        char* item_str = PyString_AsString(item);
        _columns.push_back(item_str);
    }
    cmpclass cmp_obj(_columns);
    std::sort(items.begin(), items.end(), cmp_obj);
    if(reverse)
        std::reverse(items.begin(), items.end());
}
std::vector<PyObject *> _test_vector(PyObject* store_data_list, PyObject* columns, bool reverse = false)
{
    int range_attempts = PyList_GET_SIZE(store_data_list);
    std::vector<PyObject *> data_list;
    for (int i=0; i<range_attempts; ++i) {
        data_list.push_back(PyList_GetItem(store_data_list, i));
    }
    vector_multikeysort(data_list, columns, reverse);
    return data_list;
}

cython_experiments.pyx

# -*- coding: utf-8 -*-# distutils: language = c++# distutils: include=['./']# distutils: extra_compile_args=["-O2", "-ftree-vectorize"]from __future__ import unicode_literals
import time
from libc.stdlib cimport free
from cpython.dict cimport PyDict_New, PyDict_SetItemString
from cpython.ref cimport PyObject
from libcpp.string cimport string
from libcpp.vector cimport vector
import gc
cdef extern from"cython_experiments.h":
    vector[PyObject*] _test_vector(object store_data_list, object columns, int reverse)
range_attempts = 10 ** 6
store_data_list = list()
for i from0 <= i < range_attempts:
    store_data_list.append(dict(
        name = 'test_{}'.format(i),
        test_data = i,
        test_data2 = str(i),
        test_data3 = range(10),
    ))
# Insert timedefmultikeysort(items, columns, reverse=False):
    items = list(items)
    columns = list(columns)
    columns.reverse()
    for column in columns:
        # pylint: disable=cell-var-from-loop
        is_reverse = column.startswith('-')
        if is_reverse:
            column = column[1:]
        items.sort(key=lambda row: row[column], reverse=is_reverse)
    if reverse:
        items.reverse()
    return items
cdef test_list():
    t_start = time.time()
    data_list = list()
    for i in store_data_list:
        data_list.append(i)
    data_list = multikeysort(data_list, ('name', '-test_data'), True)
    for i in data_list:
        i
    del data_list
    return time.time() - t_start
cdef test_vector():
    t_start = time.time()
    data_list = _test_vector(store_data_list, ['name', '-test_data'], 1)
    for i in data_list:
        i
    return time.time() - t_start
# Get statistic
times = dict(list=[], vector=[])
attempts = 10
gc.disable()
for i from0 <= i < attempts:
    times['list'].append(test_list())
    times['vector'].append(test_vector())
    gc.collect()
    print('''
    Attempt: {}
    List time: {}
    Vector time: {}
    '''.format(i, times['list'][-1], times['vector'][-1]))
del store_data_list
avg_list = sum(times['list']) / attempts
avg_vector = sum(times['vector']) / attempts
print('''
Statistics:
attempts: {}
list avg time: {} 
vector avg time: {} 
'''.format(attempts, avg_list, avg_vector))

Python 3.6
Statistics:
attempts: 10
list avg time: 0.2640914678573608 
vector avg time: 2.5774293661117555

Any idea what could be improved in the coparator to make it work faster?

Only registered users can participate in the survey. Sign in , please.

How do you optimize your Python applications?

Tags: