Modifying function bytecode in Python

Some time ago I needed to solve a rather unusual problem, namely, add a non-standard operator in python. This task was to generate Python code using pseudo-code similar to assembler, which contains goto operator. I did not want to write a complex lexical analyzer, the goto operator in the pseudocode was used to organize loops and conditional transitions, and I wanted to have some analogue of it in python that is not there.

There is some module laid out in honor of the first of April as a joke, but it did not work for me. I want to make a reservation right away that I am aware of the drawbacks of using this operator, but in some cases, when automatically generating code, its use greatly simplifies the life of the programmer. In addition, the described approach allows you to add any necessary code modification, if required, about this by the example of adding the goto operator and will be described later.

So, there is a problem how to add a couple of new commands to python and how to make it interpret them correctly (go to the necessary addresses). To do this, we’ll write a decorator that will hook onto the function within which we want to use the goto operator and add labels, and use the dis modules, which allows you to work with python bytecode, and new, which allows you to create internal python objects dynamically .

For starters, let's decide on the format of the commands. Since python has a number of syntax restrictions, commands of the form

a:
goto a


do not succeed. However, python allows you to add view constructs

label .a
goto .a


It should be noted here that the point plays an important role, because python skips spaces and reduces it to calls to class attributes. Recording without a dot will result in a syntax error message. So, consider the bytecode of these commands. To do this, execute the following code:

>>> def f():
>>>     label .a
>>>     goto .a
>>> import dis
>>> dis.dis( f )
  2           0 LOAD_GLOBAL              0 (label)
              3 LOAD_ATTR                1 (a)
              6 POP_TOP
  3           7 LOAD_GLOBAL              2 (goto)
             10 LOAD_ATTR                1 (a)
             13 POP_TOP
             14 LOAD_CONST               0 (None)
             17 RETURN_VALUE


Therefore, the label announcement and label transition command is reduced to three operations LOAD_GLOBAL, LOAD_ATTR, POP_TOP, the main of which are the first two. The dis module allows you to determine the bytecode of these commands using the opmap dictionary and obtain their symbolic representation by the bytecode using the opname dictionary.

>>> dis.opmap[ 'LOAD_GLOBAL' ]
116
>>> dis.opmap[ 'LOAD_ATTR' ]
105


The byte representation of the function f is stored in f.func_code.co_code, and the symbolic representations of its variables are stored in f.func_code.co_names.

>>> f.func_code.co_names
('label', 'a', 'goto')


Now a little about the byte representations of the commands that interest us. A piece of the disassembler shows that the LOAD_GLOBAL and LOAD_ATTR commands are represented by three bytes (the offset is indicated on the left), the first of which is the operation byte code (from opmap), the second and third are data (low and high byte, respectively), which are the index in the list f.func_code.co_names corresponding to which variable or attribute we want to declare.

You can determine if the command has arguments (and thus the length of the command in bytes) by comparing with dis.HAVE_ARGUMENT. If it is greater than or equal to a given constant, then it has arguments, otherwise it does not. Thus, we obtain a function for parsing the function bytecode. Next, we replace the label code with the NOP operation, and the goto statement code with JUMP_ABSOLUTE, which takes an offset within the function as a parameter. That’s practically all. Decorator code and usage example are given below.

import dis, new
class MissingLabelError( Exception ):
    pass
class ExistingLabelError( Exception ):
    pass
def goto( function ):
    labels_dict = {}
    gotos_list = []
    command_name = ''
    previous_operation = ''
    i = 0
    while i < len( function.func_code.co_code ):
        operation_code = ord( function.func_code.co_code[ i ] )
        operation_name = dis.opname[ operation_code ]
        if operation_code >= dis.HAVE_ARGUMENT:
            lo_byte = ord( function.func_code.co_code[ i + 1 ] )
            hi_byte = ord( function.func_code.co_code[ i + 2 ] )
            argument_position = ( hi_byte << 8 ) ^ lo_byte
            if operation_name == 'LOAD_GLOBAL':
                command_name = function.func_code.co_names[ argument_position ]
            if operation_name == 'LOAD_ATTR' and previous_operation == 'LOAD_GLOBAL':
                if command_name == 'label':
                    label = function.func_code.co_names[ argument_position ]
                    if labels_dict.has_key( label ):
                        raise ExistingLabelError( 'Label redifinition: %s' % label )
                    labels_dict.update( { label : i - 3 } )
                elif command_name == 'goto':
                    gotos_list += [ ( function.func_code.co_names[ argument_position ], i - 3 ) ]
            i += 3
        else:
            i += 1
        previous_operation = operation_name
    codebytes_list = list( function.func_code.co_code )
    for label, index in labels_dict.items():
        codebytes_list[ index : index + 7 ] = [ chr( dis.opmap[ 'NOP' ] ) ] * 7
    # заменяем 7 последовательно идущих байт команд LOAD_GLOBAL, LOAD_ATTR и POP_TOP на NOP
    for label, index in gotos_list:
        if label not in labels_dict:
            raise MissingLabelError( 'Missing label: %s' % label )
        target_index = labels_dict[ label ] + 7
        codebytes_list[ index ] = chr( dis.opmap[ 'JUMP_ABSOLUTE' ] )
        codebytes_list[ index + 1 ] = chr( target_index & 0xFF )
        codebytes_list[ index + 2 ] = chr( ( target_index >> 8 ) & 0xFF )
    # создаем байт-код для новой функции
    code = function.func_code
    new_code = new.code( code.co_argcount, code.co_nlocals, code.co_stacksize, code.co_flags,
        str().join( codebytes_list ), code.co_consts, code.co_names, code.co_varnames,
        code.co_filename, code.co_name, code.co_firstlineno, code.co_lnotab )
    # создаем новую функцию
    new_function = new.function( new_code, function.func_globals )
    return new_function


Usage example:

@goto
def test_function( n ):
    goto .label1
    label .label2
    print n
    goto .label3
    label .label1
    print n
    n -= 1
    if n != 0:
        goto .label1
    else:
        goto .label2
    label .label3
    print 'the end'
test_function( 10 )


The result of the example:

10
9
8
7
6
5
4
3
2
1
0
the end


In conclusion, I want to add that this solution does not quite correspond to the general style of python: it is not very reliable due to the strong dependence on the version of the interpreter (in this case, the interpreter 2.7 was used, but it should work for all versions of 2), however, the solution to this problem once again proves the great flexibility of the language and the ability to add new necessary functionality.

Also popular now: