The story of one experiment with Cython and C ++ vector

    One warm on a cold winter evening, I wanted to warm up in the office and test the theory of one colleague that the C ++ vector could cope with the task faster than the CPython list.


    In the company we are developing products based on Django and it so happened that it was necessary to process one large array of dictionaries. A colleague suggested that the implementation in C ++ would be much faster, but I had the feeling that Guido and the community were probably a little steeper than us in C, and perhaps already decided and avoided all the pitfalls, having implemented everything much faster.


    To test the theory, I decided to write a small test file, in which I decided to loop through the insertion of 1M dictionaries of the same content into an array and vector 100 times in a row.


    The results, though expected, were also sudden.


    It so happened that we actively use Cython, so in general, the results will differ in fully CPython implementation.


    Stand


    • Calculate Linux onegreyonewhite 4.18.14-calculate # 1 SMP PREEMPT Sat Oct 13 21:03:27 UTC 2018 x86_64 Intel® Core (TM) i7-4770 CPU @ 3.40GHz GenuineIntel GNU / Linux
    • Python 2.7 and 3.6
    • Cython 0.28.3
    • gcc (Gentoo 7.3.0-r3 p1.4)

    Script


    By the way, I had to tinker here. To get the most realistic numbers (i.e., not just to be super-optimized, but also so that we can use it without dancing with a tambourine), we had to do everything in the main script, and reduce all additional .h to a minimum.


    The first problem was that the Cython wrapper for the vector does not want to work in this form:


    # Так не хотел
    ctypedef vector[object] dict_vec
    # И так не завелось (ошибка появлялась на vector.push_back(dict()))
    ctypedef vector[PyObject*] dict_vec
    # И даже так, что удивительно (просто говорит, что не может object привести к PyObject.)
    ctypedef vector[PyObject] dict_vec

    With all this, they got an error that it is impossible to cast dict to PyObject. Of course, these are Cython problems, but since we use it, we need to solve this particular problem.
    I had to make a little crutch in the form

    #include"Python.h"static PyObject * convert_to_pyobject(PyObject *obj){
        return obj;
    }

    The most amazing thing is that it worked. What scares me the most is that I do not fully understand why and what the consequences are.


    Final Sources

    cython_experiments.h


    #include"Python.h"static PyObject * convert_to_pyobject(PyObject *obj){
        return obj;
    }

    cython_experiments.pyx


    # -*- coding: utf-8 -*-# distutils: language = c++# distutils: include=['./']# distutils: extra_compile_args=["-O1"]from __future__ import unicode_literals
    import time
    from libc.stdlib cimport free
    from cpython.dict cimport PyDict_New, PyDict_SetItemString
    from cpython.ref cimport PyObject
    from libcpp.string cimport string
    from libcpp.vector cimport vector
    cdef extern from"cython_experiments.h":
        PyObject* convert_to_pyobject(object obj)
    ctypedef vector[PyObject*] dict_vec
    range_attempts = 10 ** 6# Insert time
    cdef test_list():
        t_start = time.time()
        data_list = list()
        for i from0 <= i < range_attempts:
            data_list.append(dict(
                name = 'test_{}'.format(i),
                test_data = i,
                test_data2 = str(i),
                test_data3 = range(10),
            ))
        del data_list
        return time.time() - t_start
    cdef test_vector():
        t_start = time.time()
        cdef dict_vec *data_list
        data_list = new dict_vec()
        data_list.resize(range_attempts)
        for i from0 <= i < range_attempts:
            data = PyDict_New()
            PyDict_SetItemString(data, 'name', 'test_{}'.format(i))
            PyDict_SetItemString(data, 'test_data', i)
            PyDict_SetItemString(data, 'test_data2', str(i))
            PyDict_SetItemString(data, 'test_data3', range(10))
            data_list.push_back(convert_to_pyobject(data))
        free(data_list)
        return time.time() - t_start
    # Get statistic
    times = dict(list=[], vector=[])
    attempts = 100for i from0 <= i < attempts:
        times['list'].append(test_list())
        times['vector'].append(test_vector())
        print('''
        Attempt: {}
        List time: {}
        Vector time: {}
        '''.format(i, times['list'][-1], times['vector'][-1]))
    avg_list = sum(times['list']) / attempts
    avg_vector = sum(times['vector']) / attempts
    print('''
    Statistics:
    attempts: {}
    list avg time: {} 
    vector avg time: {} 
    '''.format(attempts, avg_list, avg_vector))

    Attempt 1


    I really want to be able to collect * .whl for the project and that it all started on almost any system, so the optimization flag was first set to 0. This led to a strange result:


    Python 2.7
    Statistics:
    attempts: 100
    list avg time: 2.61709237576
    vector avg time: 2.92562381506

    After some thought, I decided that we still use the -O1 flag, so I set it up and got it:


    Python 2.7
    Statistics:
    attempts: 100
    list avg time: 2.49274396896
    vector avg time: 0.922211170197

    Once I got a little excited: all the same, the belief in the professionalism of Guido and Co. let me down. But then, I noticed that the script is somehow suspiciously eating the memory, and by the end it was eating up about 20GB of RAM. The problem was as follows: in the final script, you can observe the function free, after passing the cycle. At this iteration it was not yet. Then I thought ...


    And if I do not disable gc?


    Between attempts I did gc.disable () and after trying gc.enable () . I run the build and script and get:


    Python 2.7
    Statistics:
    attempts: 100
    list avg time: 1.00309731514
    vector avg time: 0.941153049469

    In general, the difference is not big, so I thought that there is no point overpaytry to somehow pervert and just use CPython, but still collect it with Cython.
    Probably many had a question: "What about memory?" The most amazing (no) is that nothing. She grew at the same rate and in the same amount. An article came to mind , but Python did not want to go into the sources. And this meant only one thing - the problem in the implementation of the vector.


    The final


    After a lot of torment with type conversion, namely, that the vector takes a pointer to the dictionary, that final script was obtained and with gc turned on, I received an average of 2.6 times the difference (vector faster) and relatively good work with memory.


    Suddenly it dawned on me that I collect everything only under Py2.7 and did not even try to do anything with 3.6.


    And here I was really surprised (after previous results, the surprise was natural):


    Python 3.6
    Statistics:
    attempts: 100
    list avg time: 0.8771139788627624
    vector avg time: 1.075702157020569
    Python 2.7
    Statistics:
    attempts: 100
    list avg time: 2.61709237576
    vector avg time: 0.92562381506

    With all this, gc still worked, the memory was not erased and it was the same script. Understanding that after a little more than a year, I would have to say goodbye to 2.7, I still didn’t give rest that there was such a difference between them. Most often, I heard / read / experimented and Py3.6 was slower than Py2.7. However, the guys from Cython-developers did something incredible and changed the situation at the root.


    Total


    After this experiment, we decided not to bother much with the support of Python 2.7 and the alteration of any parts of the applications in C ++, simply because it is not worth it. Everything has already been written to us, we can only use this correctly to solve a specific problem.


    UPD 12/24/2018 :
    Following the advice of iCpu, and after attacks to the side, what is being checked is not to understand what and how, I tried to rewrite the C ++ part as convenient as possible for further development, as well as minimize abstractions. It turned out even worse:


    The result of poor knowledge of C ++

    cython_experiments.h


    #include"Python.h"#include<vector>#include<algorithm>#ifndef PyString_AsString#define PyString_AsString PyUnicode_AsUTF8#define PyString_FromString PyUnicode_FromString#endiftypedefstruct {char* name;
        bool reverse;
    } sortFiled;
    classcmpclass {public:
            cmpclass(std::vector<char*> fields) {
                for (std::vector<char*>::iterator it = fields.begin() ; it < fields.end(); it++){
                    bool is_reverse = false;
                    char* name;
                    if (it[0] == "-"){
                        is_reverse = true;
                        for(int i=1; i<strlen(*it); ++i)
                            name[i] = *it[i];
                    }
                    else {
                        name = *it;
                    }
                    sortFiled field = {name, is_reverse};
                    this->fields_to_cmp.push_back(field);
                }
            }
            ~cmpclass() {
                this->fields_to_cmp.clear();
                this->fields_to_cmp.shrink_to_fit();
            }
            booloperator()(PyObject* left, PyObject* right){
                //bool result = false;
                for (std::vector<sortFiled>::iterator it = this->fields_to_cmp.begin() ; it < this->fields_to_cmp.end(); it++){
                    //
                    PyObject* str_name = PyString_FromString(it->name);
                    PyObject* right_value = PyDict_GetItem(right, str_name);
                    PyObject* left_value = PyDict_GetItem(left, str_name);
                    if(!it->reverse){
                        result = left_value < right_value;
                    } else {
                        result = (left_value > right_value);
                    }
                    PyObject_Free(str_name);
                    if(!result)
                        returnfalse;
                }
                returntrue;
            }
        private:
            std::vector<sortFiled> fields_to_cmp;
    };
    voidvector_multikeysort(std::vector<PyObject *> items, PyObject* columns, bool reverse){
        std::vector<char *> _columns;
        for (int i=0; i<PyList_GET_SIZE(columns); ++i) {
            PyObject* item = PyList_GetItem(columns, i);
            char* item_str = PyString_AsString(item);
            _columns.push_back(item_str);
        }
        cmpclass cmp_obj(_columns);
        std::sort(items.begin(), items.end(), cmp_obj);
        if(reverse)
            std::reverse(items.begin(), items.end());
    }
    std::vector<PyObject *> _test_vector(PyObject* store_data_list, PyObject* columns, bool reverse = false)
    {
        int range_attempts = PyList_GET_SIZE(store_data_list);
        std::vector<PyObject *> data_list;
        for (int i=0; i<range_attempts; ++i) {
            data_list.push_back(PyList_GetItem(store_data_list, i));
        }
        vector_multikeysort(data_list, columns, reverse);
        return data_list;
    }

    cython_experiments.pyx


    # -*- coding: utf-8 -*-# distutils: language = c++# distutils: include=['./']# distutils: extra_compile_args=["-O2", "-ftree-vectorize"]from __future__ import unicode_literals
    import time
    from libc.stdlib cimport free
    from cpython.dict cimport PyDict_New, PyDict_SetItemString
    from cpython.ref cimport PyObject
    from libcpp.string cimport string
    from libcpp.vector cimport vector
    import gc
    cdef extern from"cython_experiments.h":
        vector[PyObject*] _test_vector(object store_data_list, object columns, int reverse)
    range_attempts = 10 ** 6
    store_data_list = list()
    for i from0 <= i < range_attempts:
        store_data_list.append(dict(
            name = 'test_{}'.format(i),
            test_data = i,
            test_data2 = str(i),
            test_data3 = range(10),
        ))
    # Insert timedefmultikeysort(items, columns, reverse=False):
        items = list(items)
        columns = list(columns)
        columns.reverse()
        for column in columns:
            # pylint: disable=cell-var-from-loop
            is_reverse = column.startswith('-')
            if is_reverse:
                column = column[1:]
            items.sort(key=lambda row: row[column], reverse=is_reverse)
        if reverse:
            items.reverse()
        return items
    cdef test_list():
        t_start = time.time()
        data_list = list()
        for i in store_data_list:
            data_list.append(i)
        data_list = multikeysort(data_list, ('name', '-test_data'), True)
        for i in data_list:
            i
        del data_list
        return time.time() - t_start
    cdef test_vector():
        t_start = time.time()
        data_list = _test_vector(store_data_list, ['name', '-test_data'], 1)
        for i in data_list:
            i
        return time.time() - t_start
    # Get statistic
    times = dict(list=[], vector=[])
    attempts = 10
    gc.disable()
    for i from0 <= i < attempts:
        times['list'].append(test_list())
        times['vector'].append(test_vector())
        gc.collect()
        print('''
        Attempt: {}
        List time: {}
        Vector time: {}
        '''.format(i, times['list'][-1], times['vector'][-1]))
    del store_data_list
    avg_list = sum(times['list']) / attempts
    avg_vector = sum(times['vector']) / attempts
    print('''
    Statistics:
    attempts: {}
    list avg time: {} 
    vector avg time: {} 
    '''.format(attempts, avg_list, avg_vector))
    

    Python 3.6
    Statistics:
    attempts: 10
    list avg time: 0.2640914678573608 
    vector avg time: 2.5774293661117555 

    Any idea what could be improved in the coparator to make it work faster?

    Only registered users can participate in the survey. Sign in , please.

    How do you optimize your Python applications?


    Also popular now: