@Pythonetc compilation february 2019


    This is the ninth collection of Python tips and programming from my @pythonetc feed.

    Previous selections .

    Structural Comparison


    Sometimes when testing it is necessary to compare complex structures, ignoring some values. This can usually be done by comparing specific values ​​from such a structure:

    >>> d = dict(a=1, b=2, c=3)
    >>> assert d['a'] == 1
    >>> assert d['c'] == 3

    However, you can create a special value that is equal to any other:

    >>> assert d == dict(a=1, b=ANY, c=3)

    This is easily done using the magic method __eq__:

    >>> class AnyClass:
    ...     def __eq__(self, another):
    ...         return True
    ...
    >>> ANY = AnyClass()

    stdout


    sys.stdout is a wrapper that allows you to write string values, not bytes. These string values ​​are automatically encoded using sys.stdout.encoding:

    >>> sys.stdout.write('Straße\n')
    Straße
    >>> sys.stdout.encoding
    'UTF-8'
    sys.stdout.encoding

    read-only and equal to the default encoding, which can be configured using the environment variable PYTHONIOENCODING:

    $ PYTHONIOENCODING=cp1251 python3
    Python 3.6.6 (default, Aug 13 2018, 18:24:23)
    [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import sys
    >>> sys.stdout.encoding
    'cp1251'

    If you want to write to stdoutbytes, then you can skip automatic encoding by accessing the sys.stdout.bufferbuffer placed in the wrapper:

    >>> sys.stdout
    <_io.TextIOWrapper name='' mode='w' encoding='cp1251'>
    >>> sys.stdout.buffer
    <_io.BufferedWriter name=''>
    >>> sys.stdout.buffer.write(b'Stra\xc3\x9fe\n')
    Straße
    sys.stdout.buffer

    also a wrapper. You can get around it by contacting sys.stdout.buffer.rawthe file descriptor with the help of :

    >>> sys.stdout.buffer.raw.write(b'Stra\xc3\x9fe')
    Straße

    Constant ellipsis


    Python has very few built-in constants. One of them,, Ellipsiscan also be written as .... For the interpreter, this constant does not have any specific value, but it is used where such syntax is appropriate.

    numpysupports Ellipsisas an argument __getitem__, for example, x[...]returns all elements x.

    PEP 484 defines another value for this constant: Callable[..., type]it allows you to determine the types of the called without specifying the types of arguments.

    Finally, you can use ...to indicate that a function has not yet been implemented. This is completely correct Python code:

    def x():
        ...

    However, in Python 2, Ellipsisyou cannot write in the form .... The one exception is a[...]what is interpreted as a[Ellipsis].

    This syntax is correct for Python 3, but only the first line is correct for Python 2:

    a[...]
    a[...:2:...]
    [..., ...]
    {...:...}
    a = ...
    ... is ...
    def a(x=...): ...

    Re-import modules


    Already imported modules will not load again. The team import foosimply will not do anything. However, it is useful for reimporting modules when working in an interactive environment. In Python 3.4+, you need to use this importlib:

    In [1]: import importlib
    In [2]: with open('foo.py', 'w') as f:
       ...:     f.write('a = 1')
       ...:
    In [3]: import foo
    In [4]: foo.a
    Out[4]: 1
    In [5]: with open('foo.py', 'w') as f:
       ...:     f.write('a = 2')
       ...:
    In [6]: foo.a
    Out[6]: 1
    In [7]: import foo
    In [8]: foo.a
    Out[8]: 1
    In [9]: importlib.reload(foo)
    Out[9]: 
    In [10]: foo.a
    Out[10]: 2

    There is ipythonalso an extension for autoreloadwhich, if necessary, automatically re-imports the modules:

    In [1]: %load_ext autoreload
    In [2]: %autoreload 2
    In [3]: with open('foo.py', 'w') as f:
       ...:     f.write('print("LOADED"); a=1')
       ...:
    In [4]: import foo
    LOADED
    In [5]: foo.a
    Out[5]: 1
    In [6]: with open('foo.py', 'w') as f:
       ...:     f.write('print("LOADED"); a=2')
       ...:
    In [7]: import foo
    LOADED
    In [8]: foo.a
    Out[8]: 2
    In [9]: with open('foo.py', 'w') as f:
       ...:     f.write('print("LOADED"); a=3')
       ...:
    In [10]: foo.a
    LOADED
    Out[10]: 3

    \ G


    In some languages ​​you can use an expression \G. It searches for a match from the position at which the previous search ended. This allows us to write finite state machines that process string values ​​word by word (the word is determined by a regular expression).

    In Python, there is nothing like this expression, and you can implement similar functionality by manually tracking the position and passing part of the string to regular expression functions:

    import re
    import json
    text = 'foobarbar'
    regex = '^(?:<([a-z]+)>||([a-z]+))'
    stack = []
    tree = []
    pos = 0
    while len(text) > pos:
        error = f'Error at {text[pos:]}'
        found = re.search(regex, text[pos:])
        assert found, error
        pos += len(found[0])
        start, stop, data = found.groups()
        if start:
            tree.append(dict(
                tag=start,
                children=[],
            ))
            stack.append(tree)
            tree = tree[-1]['children']
        elif stop:
            tree = stack.pop()
            assert tree[-1]['tag'] == stop, error
            if not tree[-1]['children']:
                tree[-1].pop('children')
        elif data:
            stack[-1][-1]['data'] = data
    print(json.dumps(tree, indent=4))

    In the above example, you can save time on processing without breaking the line over and over, and ask the module reto start searching from a different position.

    To do this, you need to make some changes to the code. Firstly, it re.searchdoes not support determining the position of the beginning of the search, so you have to compile the regular expression manually. Secondly, it ^indicates the beginning of the string value, and not the position of the start of the search, so you need to check manually that the match is found in the same position.

    import re
    import json
    text = 'foobarbar' * 10
    def print_tree(tree):
       print(json.dumps(tree, indent=4))
    def xml_to_tree_slow(text):
       regex = '^(?:<([a-z]+)>||([a-z]+))'
       stack = []
       tree = []
       pos = 0
       while len(text) > pos:
           error = f'Error at {text[pos:]}'
           found = re.search(regex, text[pos:])
           assert found, error
           pos += len(found[0])
           start, stop, data = found.groups()
           if start:
               tree.append(dict(
                   tag=start,
                   children=[],
               ))
               stack.append(tree)
               tree = tree[-1]['children']
           elif stop:
               tree = stack.pop()
               assert tree[-1]['tag'] == stop, error
               if not tree[-1]['children']:
                   tree[-1].pop('children')
           elif data:
               stack[-1][-1]['data'] = data
    def xml_to_tree_slow(text):
       regex = '^(?:<([a-z]+)>||([a-z]+))'
       stack = []
       tree = []
       pos = 0
       while len(text) > pos:
           error = f'Error at {text[pos:]}'
           found = re.search(regex, text[pos:])
           assert found, error
           pos += len(found[0])
           start, stop, data = found.groups()
           if start:
               tree.append(dict(
                   tag=start,
                   children=[],
               ))
               stack.append(tree)
               tree = tree[-1]['children']
           elif stop:
               tree = stack.pop()
               assert tree[-1]['tag'] == stop, error
               if not tree[-1]['children']:
                   tree[-1].pop('children')
           elif data:
               stack[-1][-1]['data'] = data
       return tree
    _regex = re.compile('(?:<([a-z]+)>||([a-z]+))')
    def _error_message(text, pos):
       return text[pos:]
    def xml_to_tree_fast(text):
       stack = []
       tree = []
       pos = 0
       while len(text) > pos:
           error = f'Error at {text[pos:]}'
           found = _regex.search(text, pos=pos)
           begin, end = found.span(0)
           assert begin == pos, _error_message(text, pos)
           assert found, _error_message(text, pos)
           pos += len(found[0])
           start, stop, data = found.groups()
           if start:
               tree.append(dict(
                   tag=start,
                   children=[],
               ))
               stack.append(tree)
               tree = tree[-1]['children']
           elif stop:
               tree = stack.pop()
               assert tree[-1]['tag'] == stop, _error_message(text, pos)
               if not tree[-1]['children']:
                   tree[-1].pop('children')
           elif data:
               stack[-1][-1]['data'] = data
       return tree
    print_tree(xml_to_tree_fast(text))

    Results:

    In [1]: from example import *
    In [2]: %timeit xml_to_tree_slow(text)
    356 µs ± 16.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    In [3]: %timeit xml_to_tree_fast(text)
    294 µs ± 6.15 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

    Rounding numbers


    This item was written by orsinium , author of the Telegram channel @itgram_channel.

    The function roundrounds the number to the specified number of decimal places.

    >>> round(1.2)
    1
    >>> round(1.8)
    2
    >>> round(1.228, 1)
    1.2

    You can also set a negative rounding accuracy:

    >>> round(413.77, -1)
    410.0
    >>> round(413.77, -2)
    400.0
    round

    returns a value of the same type as the input number:

    >>> type(round(2, 1))
    
    >>> type(round(2.0, 1))
    
    >>> type(round(Decimal(2), 1))
    
    >>> type(round(Fraction(2), 1))
    

    For your own classes, you can define processing roundusing the method __round__:

    >>> class Number(int):
    ...   def __round__(self, p=-1000):
    ...     return p
    ...
    >>> round(Number(2))
    -1000
    >>> round(Number(2), -2)
    -2

    Here the values ​​are rounded to the nearest multiple 10 ** (-precision). For example, with the precision=1value will be rounded to a multiple of 0.1: round(0.63, 1)returns 0.6. If two multiple numbers are equally close, then rounding is performed to an even number:

    >>> round(0.5)
    0
    >>> round(1.5)
    2

    Sometimes rounding a floating point number can give an unexpected result:

    >>> round(2.85, 1)
    2.9

    The fact is that most decimal fractions cannot be accurately expressed using a floating point number ( https://docs.python.org/3.7/tutorial/floatingpoint.html ):

    >>> format(2.85, '.64f')
    '2.8500000000000000888178419700125232338905334472656250000000000000'

    If you want to round half up, then use decimal.Decimal:

    >>> from decimal import Decimal, ROUND_HALF_UP
    >>> Decimal(1.5).quantize(0, ROUND_HALF_UP)
    Decimal('2')
    >>> Decimal(2.85).quantize(Decimal('1.0'), ROUND_HALF_UP)
    Decimal('2.9')
    >>> Decimal(2.84).quantize(Decimal('1.0'), ROUND_HALF_UP)
    Decimal('2.8')

    Also popular now: