Python transpiler chain → 11l → C ++ [to speed up Python code and more]




    This article discusses the most interesting transformations that a chain of two transpilers (the first translates the Python code into the code in the new 11l programming language , and the second - the 11l code in C ++), and compares the performance with other acceleration tools / code execution in Python (PyPy, Cython, Nuitka).

    Replacing "slices" \ slices on ranges \ ranges

    Python11l
    s[-1]
    s[-2]
    s[:-1]
    s[1:]
    s[:1:]
    s[1:2]
    s[::2]
    s[3:10:2]
    s[3:10:]
    
    s.last
    s[(len)-2]
    s[0..<(len)-1]
    s[1..]
    s[0..<1]
    s[1..<2]
    s[(0..).step(2)]
    s[(3..<10).step(2)]
    s[3..<10]
    
    An explicit indication for indexing from the end of the array s[(len)-2]instead of just s[-2]need to exclude the following errors:
    1. When it is required, for example, to get the previous character by s[i-1], but if i = 0, such / this record instead of an error will silently return the last character of the string [ and I encountered this error in practice - commit ] .
    2. Expression s[i:]after i = s.find(":")will not work properly when the character is not found in the string [ instead of '' part of the string from the first character :and then '' will be taken the last character string ] (and, in general, return -1function find()in Python-is, I believe is also wrong [ should return null / None [ and if -1 is required, then you should write explicitly: i = s.find(":") ?? -1] ] ).
    3. Writing s[-n:]to get the nlast characters of the string will not work correctly when n = 0.

    Chains of comparison operators


    At first glance, an outstanding feature of the Python language, but in practice it can be easily abandoned / dispensed with using the operator inand ranges:
    a < b < cb in a<..<c
    a <= b < cb in a..<c
    a < b <= cb in a<..c
    0 <= b <= 9b in 0..9

    List inclusion (list comprehension)


    Similarly, as it turned out, you can opt out of another interesting Python feature - list comprehensions.
    While some glorify list comprehension and even offer to give up `filter ()` and `map ()` , I found that:
    1. In all places where I met Python's list comprehension, you can easily get away with the functions `filter ()` and `map ()`.
      dirs[:] = [d for d in dirs if d[0] != '.'and d != exclude_dir]
      dirs[:] = filter(lambda d: d[0] != '.'and d != exclude_dir, dirs)
      '[' + ', '.join(python_types_to_11l[ty] for ty in self.type_args) + ']''[' + ', '.join(map(lambda ty: python_types_to_11l[ty], self.type_args)) + ']'# Nested list comprehension:
      matrix = [
          [1, 2, 3, 4],
          [5, 6, 7, 8],
          [9, 10, 11, 12],
      ]
      [[row[i] for row in matrix] for i in range(4)]
      list(map(lambda i: list(map(lambda row: row[i], matrix)), range(4)))
      
    2. `filter ()` and `map ()` in 11l look prettier than in Python
      dirs[:] = filter(lambda d: d[0] != '.' and d != exclude_dir, dirs)
      dirs = dirs.filter(d -> d[0] != ‘.’ & d != @exclude_dir)
      '[' + ', '.join(map(lambda ty: python_types_to_11l[ty], self.type_args)) + ']'
      ‘[’(.type_args.map(ty -> :python_types_to_11l[ty]).join(‘, ’))‘]’
      outfile.write("\n".join(x[1] for x in fileslist if x[0]))
      outfile.write("\n".join(map(lambda x: x[1], filter(lambda x: x[0], fileslist))))
      outfile.write(fileslist.filter(x -> x[0]).map(x -> x[1]).join("\n"))
      
      and therefore the need for list comprehensions in 11l is virtually eliminated [the replacement of list comprehension with filter()and / or map()is done automatically in the process of converting Python code to 11l ] .

    Convert the if-elif-else chain to switch


    While Python does not contain a switch statement, this is one of the most beautiful constructions in the 11l language, and so I decided to insert the switch automatically:
    Python11l
    ch = instr[i]
    if ch == "[":
        nesting_level += 1elif ch == "]":
        nesting_level -= 1if nesting_level == 0:
            breakelif ch == "‘":
        ending_tags.append('’') # ‘‘elif ch == "’":
        assert(ending_tags.pop() == '’')
    
    switch instr[i]
        ‘[’
            nesting_level++
        ‘]’
            if --nesting_level == 0
                loop.break
        "‘"
            ending_tags.append("’") // ‘‘
        "’"
            assert(ending_tags.pop() == "’")
    
    For completeness, here is the generated C ++ code.
    switch (instr[i])
    {
    case u'[':
        nesting_level++;
        break;
    case u']':
        if (--nesting_level == 0)
            goto break_;
        break;
    case u'‘':
        ending_tags.append(u"’"_S);
        break; // ‘‘
    case u'’':
        assert(ending_tags.pop() == u'’');
        break;
    }
    


    Converting small dictionaries to native code


    Consider this line of Python code:
    tag = {'*':'b', '_':'u', '-':'s', '~':'i'}[prev_char()]
    
    Most likely, this form of recording is not very effective [ in terms of performance ] , but very convenient.

    In 11l, the entry corresponding to this line [ and received by the Python transpiler → 11l ] is not only convenient [ however, not as elegant as in Python ] , but also fast:
    var tag = switch prev_char() {‘*’ {‘b’}; ‘_’ {‘u’}; ‘-’ {‘s’}; ‘~’ {‘i’}}
    

    The line above is shipped to:
    auto tag = [&](const auto &a){return a == u'*' ? u'b'_C : a == u'_' ? u'u'_C : a == u'-' ? u's'_C : a == u'~' ? u'i'_C : throw KeyError(a);}(prev_char());
    
    [ A call to the lambda function, the C ++ compiler will embed \ inline during the optimization process and only the chain of operators will remain ?/:. ]

    When assigning a variable, the dictionary is left as is:
    Python
    rn = {'I': 1, 'V': 5, 'X': 10, 'L': 50, ...}
    11l
    var rn = [‘I’ = 1, ‘V’ = 5, ‘X’ = 10, ‘L’ = 50, ...]
    C ++
    auto rn = create_dict(dict_of(u'I'_C, 1)(u'V'_C, 5)(u'X'_C, 10)(u'L'_C, 50)...);

    Capture \ Ca pture external variables


    In Python, to indicate that a variable is not local, but should be taken outside [ of the current function ] , the nonlocal keyword is used [ otherwise, for example, it found = Truewill be interpreted as creating a new local variable found, rather than assigning a value to an already existing external variable ] .
    In 11l, the prefix @ is used for this:
    Python11l
    writepos = 0defwrite_to_pos(pos, npos):nonlocal writepos
        outfile.write(...)
        writepos = npos
    
    var writepos = 0
    fn write_to_pos(pos, npos)
        @outfile.write(...)
        @writepos = npos
    
    C ++:
    auto writepos = 0;
    auto write_to_pos = [..., &outfile, &writepos](const auto &pos, const auto &npos)
    {
        outfile.write(...);
        writepos = npos;
    };
    

    Global variables


    Similar to external variables, if you forget to declare a global variable in Python [ via the global keyword ] , you get an inconspicuous bug:
    break_label_index = -1
    ...
    defparse(tokens, source_):global source, tokeni, token, scope
        source = source_
        tokeni = -1
        token = None
        break_label_index = -1
        scope = Scope(None)
        ...
    
    var break_label_index = -1
    ...
    fn parse(tokens, source_)
        :source = source_
        :tokeni = -1
        :token = null
        break_label_index = -1
        :scope = Scope(null)
        ...
    
    The code at 11l [ on the right ], as opposed to Python [ on the left ], will produce an 'undeclared variable break_label_index' error at compile time .

    Index / number of the current container item


    I always forget the order of the variables that the Python function returns enumerate{first comes the value and then the index or vice versa}. Analogous behavior in Ruby - - each.with_indexis much easier to remember: with index means that index comes after value, and not before. But in 11l, the logic is even easier to memorize:
    Python11l
    items = ['A', 'B', 'C']
    for index, item in enumerate(items):
        print(str(index) + ' = ' + item)
    
    var items = [‘A’, ‘B’, ‘C’]
    loop(item) items
       print(loop.index‘ = ’item)
    

    Performance


    As a test, the program uses the conversion of PC markup to HTML , and the source data is taken from the source code of the article on PC markup [ since this article is currently the largest written on PC markup ] , and is repeated 10 times, i.e. It turns out from 48.8 kilobyte article a file of 488Kb in size.

    Here is a diagram showing how many times the appropriate way to execute Python code is faster than the original [ CPython ] implementation :

    And now let's add to the diagram the implementation generated by the Python transpiler → 11l → C ++:

    Время выполнения [время преобразования файла размером 488Кб] составило 868 мс для CPython и 38 мс для сгенерированного C++ кода [это время включает в себя полноценный [т.е. не просто работу с данными в оперативной памяти] запуск программы операционной системой и весь ввод/вывод [чтение исходного файла [.pq] и сохранение нового файла [.html] на диск]].

    Я хотел ещё попробовать Shed Skin, но он не поддерживает локальные функции.
    Numba использовать также не получилось (выдаёт ошибку ‘Use of unknown opcode LOAD_BUILD_CLASS’).
    Вот архив с использовавшейся программой для сравнения производительности [под Windows] (требуются установленный Python 3.6 или выше и следующие Python-пакеты: pywin32, cython).

    Python source and output of Python → 11l and 11l → C ++ transpilers:
    PythonGenerated 11l
    (with keywords instead of letters)
    11l
    (with letters)
    C ++ generated

    Also popular now: