Compilation @pythonetc, July 2018

This is the second selection of tips on Python and programming from my author’s @pythonetc channel . Previous selections:

June 2018

Regular languages

A regular language is a formal language that can be represented as a finite state machine . In other words, for character-based text processing, you just need to remember the current state, and the number of such states is finite.

A great example: a machine that checks whether the input data is a prime number like –3, 2.2, or 001. At the beginning of the article, the state machine is shown. Double circles indicate final states, the machine can stop in them.

The machine starts from position ①. Perhaps he finds a minus, then a digit, and then at the position ③ he processes the required number of digits. After that, the decimal separator (③ → ④) can be checked, followed by a single digit (④ → ⑤) or more (⑤ → ⑤).

The classic example of an irregular language is a family of string expressions of the form: Formally, we need a line containing N instances , then , then N instances , where N is an integer greater than 0. You cannot implement this with the help of a finite state machine, because , which you calculated that you can do only using an infinite number of states. Regular expressions can only specify regular languages. Before using them, make sure that your line can be processed using a state machine. For example, they are not suitable for processing JSON, XML, or even arithmetic expressions with nested brackets.

a-b

aaa-bbb

aaaaa-bbbbb

a–b

It's funny that many modern regular expression engines are not regular. For example, the regex module for Python supports recursion (which will help in solving the problem with aaa-bbb).

Dynamic scheduling

When Python executes a method call, say a.f(b, c, d), it must first select the correct function f. By virtue of polymorphism adetermines what will be ultimately selected. The process of choosing a method is usually called dynamic dispatch.

Python only supports single-dispatch polymorphism. This means that the choice of an object is affected only by the object itself (in our example - a). In other languages, types can be taken into account b, cand dsuch a mechanism is called multiple dispatch. A prime example is the C # language.

However, multiple dispatching can be emulated using a single. For this purpose, the “visitor” design pattern was created: it uses single dispatching twice to simulate a double one.

Remember that overloading (overloading) methods (as in Java and C ++) is not an analogue of multiple dispatching. Dynamic scheduling works in runtime, and overloading is performed only during compilation.

These examples will help you better understand the topic:

Built-in names

In Python, you can easily modify all the standard variables that are available in the global scope:

>>> print = 42
>>> print(42)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'int' object is not callable

This is useful if your module defines functions whose names match the names of built-in functions. This happens in situations where you practice metaprogramming and accept an arbitrary string value as an identifier.

But even if you duplicate the names of some built-in functions, you may need access to what they originally referred to. This is what the builtins module is for:

>>> import builtins
>>> print = 42
>>> builtins.print(1)
1

Also in most modules variable is available __builtins__. But there is one trick. First, this is a feature of the cpython implementation, and usually it should not be used at all. Secondly, it __builtins__can refer to both builtins, and to builtins.__dict__, depending on how exactly the current module was loaded.

strace

Sometimes the application starts to behave strangely in battle. Instead of restarting it, you may want to understand the cause of the problems as long as possible.

The obvious solution is to analyze the program's actions and try to understand what part of the code is being executed. Proper logging facilitates this task, but your logs may not be sufficiently detailed due to the architecture or the level of logging selected in the settings.

In such cases, strace may be useful. This is a Unix utility that monitors system calls. You can start its pre - strace python script.py- but it is usually more convenient to connect to an already running application: strace -p PID.

$ cat test.py
with open('/tmp/test', 'w') as f:
f.write('test')
$ strace python test.py 2>&1 | grep open | tail -n 1
open("/tmp/test", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 3

Each line of the trace contains the name of the system call, the arguments in brackets and the return value. Because some arguments are used to return the result of a system call, and not to pass data to it, the output of a string can be suspended until the system call is completed.

In this example, the output is stopped until writing to STDIN is complete:

$ strace python -c 'input()'
read(0,

Tuple literals

One of the most inconsistent parts of the Python syntax is tuple literals.

To create a tuple is sufficient to list the values separated by commas: 1, 2, 3. What about a one-piece tuple? Just add a hanging comma: 1,. It looks ugly and often leads to errors, but it is quite logical.

How about an empty tuple? Is this one comma - ,? No, it is (). And what, brackets create a tuple, like commas? No, (4)not a tuple, it's simple 4.

In : a = [
...:     (1, 2, 3),
...:     (1, 2),
...:     (1),
...:     (),
...: ]
In : [type(x) for x in a]
Out: [tuple, tuple, int, tuple]

To confuse everything even more strongly, additional brackets are often required for literals of tuples. If you need a tuple to be the only argument of a function, then obviously f(1, 2, 3)it will not work - you will have to write f((1, 2, 3)).

Tags: