What a programmer who is switching to Python needs to remember

  • Tutorial

Once upon a time, in my student years, I was bitten by a python, although the incubation period dragged on and it turned out that I became a programmer on a pearl.

However, at some point, the pearl exhausted itself and I decided to go in for python, at first I just did something and understood what was needed for this task, and then I realized that we need some systematic knowledge and read several books:

  • Bill Lyubanovich “Simple Python. Modern programming style "
  • Dan Bader "Pure Python. Subtleties programming for pros
  • Brett Slatkin, Python Secrets: 59 Recommendations for Writing Effective Code

Which seemed to me quite suitable for understanding the main subtleties of the language, although I don’t remember what they mentioned about slots , but I’m not sure that this is a really necessary feature - if it’s already pressed from memory, then most likely this method will not be enough, but of course it all depends on the situation.

As a result, I have accumulated some notes about the features of python, which, I think, can be useful to anyone who wants to migrate to it from other languages.

I noticed that during interviews on python quite often they ask questions about things that are not related to real development, such as what can be a dictionary key (or about what it means x = yield y), well, dudes, in real life only a number or string can be a key , in those unique cases when this is not the case, you can read the documentation and figure out why to ask this? To find what the interviewee does not know? So in the end, everyone will remember the answer to this question and it will stop working.

I consider as relevant the python versions above 3.5 ( it’s time to forget about the second python ) because This is the version in stable debian, which means more recent versions in all other places)

Since I am not a python guru at all, I hope that they will correct me in the comments if I suddenly blinked out some nonsense.


Python is a dynamically typed language i.e. it checks type conformance during execution, for example:

cat type.py


python3 type.py
... TypeError: unsupported operand type(s) for +: 'int' and 'str'

However, if your project has matured to the need for static typing, then the python also provides such an opportunity by using a static analyzer mypy:

mypy type.py
type.py:3: error: Unsupported operand types for + ("int" and "str")

True, not all errors are caught like this:

cat type2.py
def greeting(name):
    return 'Hello ' + name

mypy does not swear here, and an error will occur when it is executed, so the current versions of python support a special syntax for specifying the types of function arguments:

cat type3.py
def greeting(name: str) -> str:
    return 'Hello ' + name

and now:

mypy type3.py
type3.py:4: error: Argument 1 to "greeting" has incompatible type "int"; expected "str"

Variables and data

Variables in python do not store data, but only refer to them, and the data is mutable (immutable) and immutable (immutable).
This leads to different behavior depending on the type of data in almost identical situations, for example, the following code:

x = 1
y = x
x = 2

leads to the fact that the variables xand yrefer to different data, and this:

x = [1, 2, 3]
y = x
x[0] = 7

No, xand ythey remain links to the same list (although, as noted in the comments, the example is not very successful, but I have not yet figured it out) that, by the way, in the python, you can check with the operator is(I’m sure that the creator of java lost forever a good sleep from shame I learned about this operator in python).

Although the strings look like a list are an immutable data type, this means that the string itself cannot be changed, you can only generate a new one, but you can assign a different value to the variable, although the source data will not change:

>>> mystr = 'sss'
>>> str = mystr  # делаем ссылку на те же данные
>>> mystr[0] = 'a'
  TypeError: 'str' object does not support item assignment
>>> mystr = 'ssa'  # меняем исходную переменную
>>> str  # данные не изменились и доступны по второй ссылке

Speaking of strings, because of their immobility, concatenation of a very large list of strings by adding or append in a loop may not be very effective (depends on translation in a particular compiler / version), usually for such cases it is recommended to use the join method , which behaves a little unexpected:

>>> str_list = ['ss', 'dd', 'gg']
>>> 'XXX'.join(str_list)
>>> str = 'hello'
>>> 'XXX'.join(str)

Firstly, the string in which the method is called becomes the separator, and not the beginning of a new line as one might think, and secondly, you need to transfer a list (an iterated object), not a separate line, because such is also an iterable object and will be joined by character .

Since variables are references, it’s quite normal to want to make a copy of the object so as not to break the original object, however there is a pitfall - the copy function copies only one level, which is clearly not what is expected from a function with that name, so you use it deepcopy.

A similar problem with copying can occur when a collection is multiplied by a scalar, as was recently discussed here .

Area of ​​visibility

The scope of variables in python is limited to the module / function in which it is defined and nested functions, but there is subtlety — the variable is available by default for reading in nested namespaces, but the modification requires the use of special keywords nonlocaland globalfor modifying variables one level higher or global visibility respectively.

For example, the following code:

x = 7
def func():
    return x

It works with one global variable, and this:

x = 7
def func():
    x = 1
    return x

already generates local.
From my point of view, this is not very good, in theory any use of non-local variables in a function is part of the function's public interface, its signature, and therefore must be declared explicitly and apparently at the beginning of the function.

Function arguments

Python provides simply awesome options for specifying function arguments — positional, named arguments, and their combinations.

But you need to understand how the transfer of arguments is carried out - because In Python, all variables are data references, you can guess that the transmission is carried out by reference, but there is a feature here - the link itself is passed by value i.e. You can modify the mutable value at the link:

def add_element(mylist):
mylist = [1,2]


python3 arg_modify.py
[1, 2, 3]

however, you cannot overwrite the original link in the function:

def try_del(mylist):
    mylist = []
    return mylist
mylist = [1,2]

The original link is alive and working:

python3 arg_kill.py
[1, 2]

Also, you can set default values ​​for arguments, but there is one unclear thing to remember: default values ​​are calculated once when defining a function, it does not create any problems if you pass unchanged data as default values, and if you pass variable data or dynamic value, the result will be a bit unexpected:

changeable data:

cat arg_list.py
def func(arg = []):
    return arg


python3 arg_list.py
['x', 'x']
['x', 'x', 'x']

dynamic value:

cat arg_now.py
from datetime import datetime
def func(arg = datetime.now()):
    return arg

we get:

python3 arg_now.py
2018-09-28 10:28:40.771879
2018-09-28 10:28:40.771879
2018-09-28 10:28:40.771879


OOP in python is done quite interestingly (some of propertywhich are worth it) and this is a big topic, but sapiens familiar with OOP can easily google everything (or find it on Habré ) what he wants, so there is no point in repeating, the only negative standard classes is the template code dander methods , I personally like the attrs library , it is much more pythonic.
It is worth mentioning that since all objects, including functions and classes, are in python, classes can be created dynamically (without use eval) by the type function .
Also it is worth reading about metaclasses ( on habr ) and descriptors ( habr ).
A feature to remember is that the attributes of a class and an object are not the same; in the case of immutable attributes, this does not cause problems since the attributes are “shadowed” (shadowing) —there are automatically created attributes of an object with the same name, but in the case of variable attributes you can get not exactly what was expected:

cat class_attr.py
class MyClass:
    storage = [7,]
    def __init__(self, number):
        self.number = number
obj = MyClass(1)
obj2 = MyClass(2)
obj.number = 5
print(obj2.storage, obj2.number)

we get:

python3 class_attr.py
[7, 8] 2

as you can see, they were changed obj, but storagechanged as well obj2. this attribute (as opposed to number) belongs not to the instance, but to the class.

.sort () vs sorted ()

In python, there are two ways to sort a list. The first is a method .sort()that changes the source list and returns nothing (None) i.e. I can't do it like this:

my_list = my_list.sort()

The second is the function sorted()that generates a new list and is able to work with all the objects being iterated. Who wants more information should start with SO .

Standard library

Typically, the standard python library includes excellent solutions to typical problems, but it is worthwhile to approach critically, for there are plenty of oddities. True, it also happens that what seems strange at first glance turns out to be the best solution, you just need to know all the conditions (see further about range), but there are still oddities.

For example, the unit test module that comes with the unittest has nothing to do with python and smells of java, therefore, as the python author says : "Eveybody is using py.test ...". Although it is quite interesting, though not always the appropriate doctest module comes as standard.

Going to the delivery module urllib does not have such a nice interface as a module Stronach 'requests .

The same story with the module for parsing command line parameters - the bundled argparse is a demonstration of the OOP of the brain, and the docopt module seems just a smart solution - the ultimate self-documenting! Although, according to rumors, despite the docopt and for click there is a niche.

With the debugger as well - as I understood the bundled pdb very few people use, there are many alternatives, but it seems that the majority of developers use ipdb , which, from my point of view, is most convenient to use via the debug wrapper module .
It allows import ipdb;ipdb.set_trace()you to simply write instead import debug, it also adds a see module for convenient inspection of objects.

To replace the standard serialization module pickle do dill , then by the way it is worth remembering that these modules are not suitable for data exchange in external systems because to restore arbitrary objects obtained from an uncontrolled source is unsafe, for such cases there is json (for REST) ​​and gRPC (for RPC).

To replace the standard module, processing regular expressions re makes the regex module with all sorts of additional buns, like character classes ala \p{Cyrillic}.
By the way, something was missed for a python fun debugger for regexes similar to pearl barley .

Here is another example - a person made his own in-place module to fix the curvature and incompleteness of the API of the standard fileinput module in the in-place part of file editing.

Well, I think there are many such cases, since even I didn’t get one, so be vigilant and don’t forget to look at all kinds of awesome useful lists , I think that a good pythonic has a chuik for a measure of pythonism, this is by the way a topic for a separate conversation - according to my feelings (of course, there are no statistics on this topic and apparently there cannot be) in the python-world the level of specialists is above average, because often good software will be written in python, write in the comments what you think about this.

Parallelism and Competitiveness

Python provides ample opportunities for both parallel and competitive programming, but not without its features.

If you need parallelism, and this happens when your tasks require calculations, then you should pay attention to the multiprocessing module .

And if your tasks have a lot of IO waiting, then the python provides a lot of options to choose from, from threads and gevent , to asyncio .
All of these options look quite suitable for use (although threads require much more resources), but there is a feeling that asyncio is squeezing the rest slowly, including thanks to all kinds of buns like uvloop .

If someone didn’t notice - in the thread Python it’s not about parallelism, I’m not competent enough to tell well about GIL , but there are enough materials on this topic, so there’s no such need, the main thing to remember is that the threads in python (more precisely in CPython) behave differently from other programming languages ​​- they are executed only on one core, which means they are not suitable for cases when you need real parallelism, however, execution of threads is suspended while waiting for input / output, so you can use them for competitiveness.

Other oddities

In python, it is a = a + bnot always equivalent to a += b:

a = [1]
a = a + (2,3)
TypeError: can only concatenate list (not "tuple") to list
a += (2,3)
[1, 2, 3]

For details, I send it to SO, for I have not yet found the time to figure out why it is so, in the sense of what reason they did it, like this again about mutability.

Oddities that are not weird

At first glance it seemed to me strange that the range type does not include the right border, but then a kind person suggested that I should not learn where I need to learn and it turned out that everything is quite logical.

A separate large topic is rounding (although this is a problem common to almost all programming languages), besides the fact that rounding is used in any manner other than what everyone has learned in the school course of mathematics, this also imposes problems of floating point numbers, referring to detailed article .
Roughly speaking, instead of the usual, in the school course of mathematics, rounding up using the half-up algorithm, the half-to-even algorithm is used , which reduces the likelihood of distortion in statistical analysis and is therefore recommended by the IEEE 754 standard.

Also, I could not understand why -22//10=-3, and then, another kind person indicated that this inevitably follows from the mathematical definition itself, according to which the remainder cannot be negative, which leads to such unusual behavior for negative numbers.
ACHTUNG! Now this is again a strange thing and I do not understand anything, see this thread .

Also popular now: