
@Pythonetc compilation february 2019

This is the ninth collection of Python tips and programming from my @pythonetc feed.
Previous selections .
Structural Comparison
Sometimes when testing it is necessary to compare complex structures, ignoring some values. This can usually be done by comparing specific values from such a structure:
>>> d = dict(a=1, b=2, c=3)
>>> assert d['a'] == 1
>>> assert d['c'] == 3
However, you can create a special value that is equal to any other:
>>> assert d == dict(a=1, b=ANY, c=3)
This is easily done using the magic method
__eq__
:>>> class AnyClass:
... def __eq__(self, another):
... return True
...
>>> ANY = AnyClass()
stdout
sys.stdout is a wrapper that allows you to write string values, not bytes. These string values are automatically encoded using
sys.stdout.encoding
:>>> sys.stdout.write('Straße\n')
Straße
>>> sys.stdout.encoding
'UTF-8'
sys.stdout.encoding
read-only and equal to the default encoding, which can be configured using the environment variable
PYTHONIOENCODING
:$ PYTHONIOENCODING=cp1251 python3
Python 3.6.6 (default, Aug 13 2018, 18:24:23)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.stdout.encoding
'cp1251'
If you want to write to
stdout
bytes, then you can skip automatic encoding by accessing the sys.stdout.buffer
buffer placed in the wrapper:>>> sys.stdout
<_io.TextIOWrapper name='' mode='w' encoding='cp1251'>
>>> sys.stdout.buffer
<_io.BufferedWriter name=''>
>>> sys.stdout.buffer.write(b'Stra\xc3\x9fe\n')
Straße
sys.stdout.buffer
also a wrapper. You can get around it by contacting
sys.stdout.buffer.raw
the file descriptor with the help of :>>> sys.stdout.buffer.raw.write(b'Stra\xc3\x9fe')
Straße
Constant ellipsis
Python has very few built-in constants. One of them,,
Ellipsis
can also be written as ...
. For the interpreter, this constant does not have any specific value, but it is used where such syntax is appropriate. numpy
supports Ellipsis
as an argument __getitem__
, for example, x[...]
returns all elements x
. PEP 484 defines another value for this constant:
Callable[..., type]
it allows you to determine the types of the called without specifying the types of arguments. Finally, you can use
...
to indicate that a function has not yet been implemented. This is completely correct Python code:def x():
...
However, in Python 2,
Ellipsis
you cannot write in the form ...
. The one exception is a[...]
what is interpreted as a[Ellipsis]
. This syntax is correct for Python 3, but only the first line is correct for Python 2:
a[...]
a[...:2:...]
[..., ...]
{...:...}
a = ...
... is ...
def a(x=...): ...
Re-import modules
Already imported modules will not load again. The team
import foo
simply will not do anything. However, it is useful for reimporting modules when working in an interactive environment. In Python 3.4+, you need to use this importlib
:In [1]: import importlib
In [2]: with open('foo.py', 'w') as f:
...: f.write('a = 1')
...:
In [3]: import foo
In [4]: foo.a
Out[4]: 1
In [5]: with open('foo.py', 'w') as f:
...: f.write('a = 2')
...:
In [6]: foo.a
Out[6]: 1
In [7]: import foo
In [8]: foo.a
Out[8]: 1
In [9]: importlib.reload(foo)
Out[9]:
In [10]: foo.a
Out[10]: 2
There is
ipython
also an extension for autoreload
which, if necessary, automatically re-imports the modules:In [1]: %load_ext autoreload
In [2]: %autoreload 2
In [3]: with open('foo.py', 'w') as f:
...: f.write('print("LOADED"); a=1')
...:
In [4]: import foo
LOADED
In [5]: foo.a
Out[5]: 1
In [6]: with open('foo.py', 'w') as f:
...: f.write('print("LOADED"); a=2')
...:
In [7]: import foo
LOADED
In [8]: foo.a
Out[8]: 2
In [9]: with open('foo.py', 'w') as f:
...: f.write('print("LOADED"); a=3')
...:
In [10]: foo.a
LOADED
Out[10]: 3
\ G
In some languages you can use an expression
\G
. It searches for a match from the position at which the previous search ended. This allows us to write finite state machines that process string values word by word (the word is determined by a regular expression). In Python, there is nothing like this expression, and you can implement similar functionality by manually tracking the position and passing part of the string to regular expression functions:
import re
import json
text = 'foobar bar '
regex = '^(?:<([a-z]+)>||([a-z]+))'
stack = []
tree = []
pos = 0
while len(text) > pos:
error = f'Error at {text[pos:]}'
found = re.search(regex, text[pos:])
assert found, error
pos += len(found[0])
start, stop, data = found.groups()
if start:
tree.append(dict(
tag=start,
children=[],
))
stack.append(tree)
tree = tree[-1]['children']
elif stop:
tree = stack.pop()
assert tree[-1]['tag'] == stop, error
if not tree[-1]['children']:
tree[-1].pop('children')
elif data:
stack[-1][-1]['data'] = data
print(json.dumps(tree, indent=4))
In the above example, you can save time on processing without breaking the line over and over, and ask the module
re
to start searching from a different position. To do this, you need to make some changes to the code. Firstly, it
re.search
does not support determining the position of the beginning of the search, so you have to compile the regular expression manually. Secondly, it ^
indicates the beginning of the string value, and not the position of the start of the search, so you need to check manually that the match is found in the same position.import re
import json
text = 'foobar bar ' * 10
def print_tree(tree):
print(json.dumps(tree, indent=4))
def xml_to_tree_slow(text):
regex = '^(?:<([a-z]+)>||([a-z]+))'
stack = []
tree = []
pos = 0
while len(text) > pos:
error = f'Error at {text[pos:]}'
found = re.search(regex, text[pos:])
assert found, error
pos += len(found[0])
start, stop, data = found.groups()
if start:
tree.append(dict(
tag=start,
children=[],
))
stack.append(tree)
tree = tree[-1]['children']
elif stop:
tree = stack.pop()
assert tree[-1]['tag'] == stop, error
if not tree[-1]['children']:
tree[-1].pop('children')
elif data:
stack[-1][-1]['data'] = data
def xml_to_tree_slow(text):
regex = '^(?:<([a-z]+)>||([a-z]+))'
stack = []
tree = []
pos = 0
while len(text) > pos:
error = f'Error at {text[pos:]}'
found = re.search(regex, text[pos:])
assert found, error
pos += len(found[0])
start, stop, data = found.groups()
if start:
tree.append(dict(
tag=start,
children=[],
))
stack.append(tree)
tree = tree[-1]['children']
elif stop:
tree = stack.pop()
assert tree[-1]['tag'] == stop, error
if not tree[-1]['children']:
tree[-1].pop('children')
elif data:
stack[-1][-1]['data'] = data
return tree
_regex = re.compile('(?:<([a-z]+)>||([a-z]+))')
def _error_message(text, pos):
return text[pos:]
def xml_to_tree_fast(text):
stack = []
tree = []
pos = 0
while len(text) > pos:
error = f'Error at {text[pos:]}'
found = _regex.search(text, pos=pos)
begin, end = found.span(0)
assert begin == pos, _error_message(text, pos)
assert found, _error_message(text, pos)
pos += len(found[0])
start, stop, data = found.groups()
if start:
tree.append(dict(
tag=start,
children=[],
))
stack.append(tree)
tree = tree[-1]['children']
elif stop:
tree = stack.pop()
assert tree[-1]['tag'] == stop, _error_message(text, pos)
if not tree[-1]['children']:
tree[-1].pop('children')
elif data:
stack[-1][-1]['data'] = data
return tree
print_tree(xml_to_tree_fast(text))
Results:
In [1]: from example import *
In [2]: %timeit xml_to_tree_slow(text)
356 µs ± 16.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [3]: %timeit xml_to_tree_fast(text)
294 µs ± 6.15 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Rounding numbers
This item was written by orsinium , author of the Telegram channel @itgram_channel.
The function
round
rounds the number to the specified number of decimal places.>>> round(1.2)
1
>>> round(1.8)
2
>>> round(1.228, 1)
1.2
You can also set a negative rounding accuracy:
>>> round(413.77, -1)
410.0
>>> round(413.77, -2)
400.0
round
returns a value of the same type as the input number:
>>> type(round(2, 1))
>>> type(round(2.0, 1))
>>> type(round(Decimal(2), 1))
>>> type(round(Fraction(2), 1))
For your own classes, you can define processing
round
using the method __round__
:>>> class Number(int):
... def __round__(self, p=-1000):
... return p
...
>>> round(Number(2))
-1000
>>> round(Number(2), -2)
-2
Here the values are rounded to the nearest multiple
10 ** (-precision)
. For example, with the precision=1
value will be rounded to a multiple of 0.1: round(0.63, 1)
returns 0.6
. If two multiple numbers are equally close, then rounding is performed to an even number:>>> round(0.5)
0
>>> round(1.5)
2
Sometimes rounding a floating point number can give an unexpected result:
>>> round(2.85, 1)
2.9
The fact is that most decimal fractions cannot be accurately expressed using a floating point number ( https://docs.python.org/3.7/tutorial/floatingpoint.html ):
>>> format(2.85, '.64f')
'2.8500000000000000888178419700125232338905334472656250000000000000'
If you want to round half up, then use
decimal.Decimal
:>>> from decimal import Decimal, ROUND_HALF_UP
>>> Decimal(1.5).quantize(0, ROUND_HALF_UP)
Decimal('2')
>>> Decimal(2.85).quantize(Decimal('1.0'), ROUND_HALF_UP)
Decimal('2.9')
>>> Decimal(2.84).quantize(Decimal('1.0'), ROUND_HALF_UP)
Decimal('2.8')