Code Like a Pythonista: Idiomatic Python (part2)

Original author: David Goodger
  • Transfer
Kaa, the Python


After a short break, I present the final part of the translation of David Goodger's article “Write the code like a real Pythonist: Python idiom”.


Links to the first and second parts.


I emphasize once again that the author in this article does not discover America, most Pythonists will not find any “special magic” in it. But the methodologies for using and selecting various constructs in Python are pretty detailed in terms of readability and proximity to PEP8 ideology.
In some places in the author's article, there are no examples of source codes. Of course, I left it as it is, did not come up with my own, in principle, it should be clear what the author had in mind.





Generators of lists (“List Comprehensions” - possibly as “convolution of lists” - note. Transl.)


List generators (“listcomps” for short) is a syntax shortcut for the following pattern: The
traditional path, with for and if statements:
new_list = []

for item in a_list:

    if condition(item):

        new_list.append(fn(item))


And so with the list generator:
new_list = [fn(item) for item in a_list

            if condition(item)]


List generators are clear and concise down to the point. You may need a lot of nested for and if conditions in the list generator, but for two, three loops, or sets of if conditions, I recommend using nested for loops. According to Python Zen, it is better to choose a more readable way.
For example, a list of squares in a number series 0–9:
>>> [n ** 2 for n in range(10)]

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


List of odd squares between 0–9:
>>> [n ** 2 for n in range(10) if n % 2]

[1, 9, 25, 49, 81]




Generator Expressions (1)


Let's sum the squares of numbers to 100:
In a loop:
total = 0 
for num in range(1, 101):

    total += num * num 

You can use the sum function to quickly assemble the sequence that suits us.
With list generator:
total = sum([num * num for num in range(1, 101)])


With a generator expression:
total = sum(num * num for num in xrange(1, 101))


Generator expressions (“genexps”) are as simple as list generators, but list generators are “greedy” and generator expressions are “lazy”. The list generator computes the whole result list, all at once. The expression generator calculates only one value per pass, when needed. This is especially useful for long sequences when a computed list is just an intermediate step, not the final result.
In this case, we are only interested in the total amount; we don’t need an intermediate list of squared numbers. We use xrange for the same reason: it returns values ​​lazily, one per iteration.

Generator Expressions (2)


For example, if we were to summarize the squares of several billion numbers, we would run into a lack of memory, and expression-generators have no such problem. But it still takes time!
total = sum(num * num

            for num in xrange(1, 1000000000))


The difference in syntax is that the list generator is placed in square brackets, but the generator expression is not. Generator expressions are sometimes required to be enclosed in parentheses, so you should always use them.
The basic rule:
  • Use the list generator when the calculated list is the desired result.
  • Use a generator expression when a computed list is just an intermediate step.


Here is an example that came across recently in work.
? (here for some reason the example code is missing - approx. transl.)
We need a dictionary containing numbers (both strings and integers) and month codes for future contracts. It can be obtained with just one line of code.
? (here for some reason the example code is missing - approx. transl.) The
following will help us:
  • dict () accepts a list of key / value pairs (2-tuples).
  • We have a list of month codes (each month is encoded with a single letter character, and the string is also a list of characters). We use the enumerate function for this list to get numbered codes for all months.
  • Month numbers start at 1, but Python starts indexing at 0, so the month number is one more than the corresponding index.
  • We need a month search by line and number. We can use the functions int (), str () for this and iterate over them in a loop.


Recent example:
month_codes = dict((fn(i+1), code)

    for i, code in enumerate('FGHJKMNQUVXZ')

    for fn in (int, str))


month_codes result:
{ 1:  'F',  2:  'G',  3:  'H',  4:  'J', ...

 '1': 'F', '2': 'G', '3': 'H', '4': 'J', ...}



Sorting


Sorting lists in Python is easy:
a_list.sort()


(Note that sorting the list is done in it itself: the original list is sorted, and the sort method does not return the list or its copy)
But what if you have a list of data that you need to sort, but in a different way from the standard order (i.e. E. sorting by the first column, then by the second, etc.)? You may need to sort first by the second column, then by the fourth.

We can use the built-in sort list method with a special function:
def custom_cmp(item1, item2):

    return cmp((item1[1], item1[3]),

               (item2[1], item2[3]))



a_list.sort(custom_cmp)


This works, but it is extremely slow with large lists.

Sort with DSU *


DSU = Decorate-Sort-Undecorate
* Note: DSUs are often less needed. See the next section, “Sorting with a Key,” for a description of another method.
Instead of creating a special comparison function, we create an auxiliary list with which sorting will be normal:
# Decorate:

to_sort = [(item[1], item[3], item)

           for item in a_list]



# Sort:

to_sort.sort()



# Undecorate:

a_list = [item[-1] for item in to_sort]


The first line creates a list containing tuples: consisting of a sorting condition in the desired order and a complete data record (element).
The second line performs traditional sorting, fast and efficient.
The third row retrieves the last value from the sorted list.
Remember, this last value is the whole element (record, block) of data. We discard the sorting conditions by which the work was done, and they are no longer needed.

This achieves a compromise of the memory used, the complexity of the algorithm and the runtime. Much easier and faster, but you have to duplicate the original list.

Key sorting


Python 2.4 introduced the optional “key” argument in the sort list method, which in turn sets the function of a single argument, which is used to calculate the comparison key for each list item. For instance:
def my_key(item):

    return (item[1], item[3])



to_sort.sort(key=my_key)


The my_key function will be called once for each to_sort list item.
You can collect your own key-function or use any existing function of one argument, if necessary:
  • str.lower to sort alphabetically case-insensitively.
  • len to sort by the length of the elements (rows or containers).
  • int or float to sort by the order of numbers, as with numeric strings like "2", "123", "35".


Generators


We have already seen expression generators. We can develop our arbitrarily complex generators as functions:
def my_range_generator(stop):

    value = 0

    while value < stop:

        yield value

        value += 1



for i in my_range_generator(10):

    do_something(i)


The yield keyword turns a function into a generator. When you call a generator function, instead of executing the code, Python returns a generator object, which, as we recall, is an iterator; and he has a next method. The for loop simply calls the next iterator method until a StopIteration exception is thrown. You can call StopIteration explicitly or implicitly, falling out at the end of the code, as above.
Generators can simplify sequence / iterator processing since we do not need to compile a specific list; just one value is calculated per iteration.

I’ll explain how the for loop actually works. Python looks at the sequence after the in keyword. If it's a simple container (like a list, tuple, dictionary, set, or user-defined), Python will convert it to an iterator. If this object is already an iterator, Python uses it directly.
Then Python repeatedly calls the next iterator method, binds the return value to the loop counter (i in this case), and executes the loop body code. This process repeats over and over until a StopIteration exception is thrown or a break statement is executed in the body of the loop.
The for loop may include an else clause (otherwise), the code of which will be executed after exiting the loop, but not afterexecute the break statement. This feature provides very elegant solutions. The else clause is not always and not often used with a for loop, but it may come in handy. Sometimes else successfully expresses the logic you need.
For example, if you need to check a condition contained in some element, any element of a sequence:
for item in sequence:

    if condition(item):

        break

else:

    raise Exception('Condition not satisfied.')




Generator example


Filter out empty lines from a CSV file (or items from a list):
def filter_rows(row_iterator):

    for row in row_iterator:

        if row:

            yield row



data_file = open(path, 'rb')

irows = filter_rows(csv.reader(data_file))



Reading lines from a text file


datafile = open('datafile')

for line in datafile:

    do_something(line)


This is possible, since the files support the next method, as other iterators do: lists, tuples, dictionaries (for their keys),
generators.
Be careful here: due to the buffering of file operations, you cannot mix .next and .read * methods if you are not using Python 2.5+.


EAFP vs. LBYL


It’s easier to ask forgiveness than permission. (It's easier to ask forgiveness than permission)
Measure seven times, one cut. (Look before you leap)
Usually EAFP is preferable, but not always.
  • Duck typing
    If it walks like a duck, quacks like a duck and looks like a duck, then it's a duck. (Goose? Close enough.)
  • Exceptions
    Use explicit guidance if an object should be of a specific type. If x must be a string in order for your code to work, then why not declare it?

    str(x)


    and instead of trying at random, use something like:

    isinstance(x, str)




EAFP try / except Example


You can put exception-prone code in a try / except block to catch errors, and you may end up with a more general solution than if you tried to provide all the options.
try:

    return str(x)

except TypeError:

    ...


Note: you always need to identify the exceptions you need to catch. Never use a pure except clause. A pure except condition will catch all exceptions that occur in your code, making debugging extremely difficult.

Import


from module import *


You probably saw this “wild card” (wild card, template) in module import expressions. Perhaps you even like her. Do not use it.
Adaptation of the famous dialogue:
(Outer Dagoba, jungle, swamp and fog.)
Hatch: from module import * is better than explicit import?
YODA: Not better, no. Faster, easier, more seductive.
Luke: But how do I know that explicit imports are better than wild cards?
YODA: You find out when you want to read your code after six months to try.

(Just in case, I quote the text of the original - note. Transl.)
(Exterior Dagobah, jungle, swamp, and mist.)
LUKE: Is from module import * better than explicit imports?
YODA: No, not better. Quicker, easier, more seductive.
LUKE: But how will I know why explicit imports are better than
the wild-card form?
YODA: Know you will when your code you try to read six months
from now.

Wild Card Import - The Dark Side of Python.

Never!
from module import * severely pollutes the namespace. You will find objects in your local namespace that you did not expect to receive. You can see the names overriding local ones defined earlier in the module. You cannot figure out exactly where these names come from. Although this form is short and simple, it does not belong in the final code.
Moral: Do not use import with a wild card!
So much better:
  • binding of names through their modules (full description of identifiers, indicating their origin),
  • import of long module names through a shortened name (alias, alias),
  • or explicitly import exactly the names you need.

Namespace pollution alarm!
Instead,
bind the names through their modules (identifiers described in detail, indicating their origin):
import module

module.name


or import long module names through alias:
import long_module_name as mod

mod.name


or explicitly import only the names you need:
from module import name

name


Note that this form is not suitable for use in the interactive interpreter, where you may want to edit and reload ("reload ()") the module.

Modules and Scripts


To make both an imported module and an executable script:
if __name__ == '__main__':

    # script code here


When you import it, the __name__ module attribute is set as the file name without the extension ".py". So the code under the if condition will not work when the module is imported. When you execute the script, the __name__ attribute is set to "__main__", and the script code will work.
Except in some special cases, you should not put all the code at the top level. Hide the code in functions, classes, methods, and close it with if __name__ == '__main__'.


Module structure


"""module docstring"""



# imports

# constants

# exception classes

# interface functions

# classes

# internal functions & classes



def main(...):

    ...



if __name__ == '__main__':

    status = main()

    sys.exit(status)


This is how the module should be structured.

Command line processing


Example: cmdline.py:
#!/usr/bin/env python



"""

Module docstring.

"""




import sys

import optparse



def process_command_line(argv):

    """

    Return a 2-tuple: (settings object, args list).

    `argv` is a list of arguments, or `None` for ``sys.argv[1:]``.

    """


    if argv is None:

        argv = sys.argv[1:]



    # initialize the parser object:

    parser = optparse.OptionParser(

        formatter=optparse.TitledHelpFormatter(width=78),

        add_help_option=None)



    # define options here:

    parser.add_option(      # customized description; put --help last

        '-h', '--help', action='help',

        help='Show this help message and exit.')



    settings, args = parser.parse_args(argv)



    # check number of arguments, verify values, etc.:

    if args:

        parser.error('program takes no command-line arguments; '

                     '"%s" ignored.' % (args,))



    # further process settings & args if necessary



    return settings, args



def main(argv=None):

    settings, args = process_command_line(argv)

    # application code here, like:

    # run(settings, args)

    return 0         # success



if __name__ == '__main__':

    status = main()

    sys.exit(status)




Packages


package/

    __init__.py

    module1.py

    subpackage/

        __init__.py

        module2.py


  • Use to organize your projects.
  • Reduce the cost of finding the path at boot time.
  • Reduce name import conflicts.

Example:
import package.module1

from package.subpackage import module2

from package.subpackage.module2 import name


In Python 2.5, we now have absolute and relative imports through future import:
from __future__ import absolute_import


I have not yet figured it out deep enough, so we will omit this part of our discussion.


Simple is better than complex


First, debugging is twice harder than writing code. Therefore, if you write code as smartly as possible, you are, by definition, not smart enough to debug it.
—Brian W. Kernighan, co-author of The C Programming Language and the “K” in “AWK”

In other words, keep your programs simple!

Do not reinvent the wheel


Before writing any code:
  • Check out the standard Python library.
  • Проверьте Python Package Index («Сырная лавка») http://cheeseshop.python.org/pypi
    (видимо, намек на скетч о сырной лавке, подобное нашел в Вики-учебнике — примеч. перев.)
  • Поищите в сети. Google ваш друг.


Ссылки









The article was prepared in the Habra Editor , code samples are painted with Habra-colorer with default style.

Also popular now: