SLY_G May 14, 2015 at 02:25

Introduction to functional Python programming

Transfer

Talking about functional programming, people often start to give out a bunch of "functional" characteristics. Immutable data, first-class functions, and tail recursion optimization. These are language features that help you write functional programs. They mention mapping, currying, and the use of higher order functions. These are programming techniques used to write functional code. They mention parallelization, lazy computing, and determinism. These are the benefits of functional programs.

Hammer. Functional code has one property: the absence of side effects. It does not rely on data outside the current function, and does not change data outside the function. All other "properties" can be deduced from this.

Non-functional function:

a = 0
def increment1():
    global a
    a += 1

Functional Function:

def increment2(a):
    return a + 1

Instead of going through the list, use map and reduce

Map

Accepts a function and data set. Creates a new collection, performs a function at each data item, and adds the return value to the new collection. Returns a new collection.

A simple map that accepts a list of names and returns a list of lengths:

name_lengths = map(len, ['Маша', 'Петя', 'Вася'])
print name_lengths
# => [4, 4, 3]

This map squares each element:

squares = map(lambda x: x * x, [0, 1, 2, 3, 4])
print squares
# => [0, 1, 4, 9, 16]

It does not accept a named function, but takes an anonymous function defined via lambda. Lambda parameters are defined to the left of the colon. The body of the function is on the right. The result is returned implicitly.

The non-functional code in the following example takes a list of names and replaces them with random nicknames.

import random
names = ['Маша', 'Петя', 'Вася']
code_names = ['Шпунтик', 'Винтик', 'Фунтик']
for i in range(len(names)):
    names[i] = random.choice(code_names)
print names
# => ['Шпунтик', 'Винтик', 'Шпунтик']

The algorithm can assign the same nicknames to different secret agents. Hopefully this will not be a source of problems during a secret mission.

We rewrite this via map:

import random
names = ['Маша', 'Петя', 'Вася']
secret_names = map(lambda x: random.choice(['Шпунтик', 'Винтик', 'Фунтик']), names)

Exercise 1 . Try rewriting the following code through map. It takes a list of real names and replaces them with nicknames using a more reliable method.

names = ['Маша', 'Петя', 'Вася']
for i in range(len(names)):
    names[i] = hash(names[i])
print names
# => [6306819796133686941, 8135353348168144921, -1228887169324443034]

My decision:

names = ['Маша', 'Петя', 'Вася']
secret_names = map(hash, names)

Reduce

Reduce accepts a function and a set of items. Returns the value obtained by combining all items.

An example of a simple reduce. Returns the sum of all items in a set:

sum = reduce(lambda a, x: a + x, [0, 1, 2, 3, 4])
print sum
# => 10

x is the current item, and is the battery. This is the value that lambda returns in the previous paragraph. reduce () iterates over all the values, and runs for each lambda on the current values of a and x, and returns the result in a for the next iteration.

And what is the same in the first iteration? It is equal to the first element of the collection, and reduce () starts with the second element. That is, the first x will be equal to the second item in the set.

The following example considers how often the word “captain” appears in the list of lines:

sentences = ['капитан джек воробей',
             'капитан дальнего плавания',
             'ваша лодка готова, капитан']
cap_count = 0
for sentence in sentences:
    cap_count += sentence.count('капитан')
print cap_count
# => 3

The same code using reduce:

sentences = ['капитан джек воробей',
             'капитан дальнего плавания',
             'ваша лодка готова, капитан']
cap_count = reduce(lambda a, x: a + x.count('капитан'),
                   sentences,
                   0)

And where does the initial value of a come from? It cannot be calculated from the number of repetitions in the first row. Therefore, it is specified as the third argument to the reduce () function.

Why is map and reduce better?

Firstly, they usually fit on one line.

Secondly, the important parts of the iteration — collection, operation, and return value — are always in the same place as map and reduce.

Thirdly, the code in the loop can change the value of previously defined variables, or affect the code located after it. By convention, map and reduce are functional.

Fourth, map and reduce are elementary operations. Instead of reading the loops line by line, it’s easier for the reader to perceive the map and reduce built into complex algorithms.

Fifth, they have many friends that allow the useful, slightly modified behavior of these functions. For example, filter, all, any and find.

Exercise 2: rewrite the following code using map, reduce and filter. Filter accepts function and collection. Returns a collection of those things for which the function returns True.

people = [{'имя': 'Маша', 'рост': 160},
    {' рост ': 'Саша', ' рост ': 80},
    {'name': 'Паша'}]
height_total = 0
height_count = 0
for person in people:
    if 'рост' in person:
        height_total += person[' рост ']
        height_count += 1
if height_count > 0:
    average_height = height_total / height_count
    print average_height
    # => 120

My decision:

people = [{'имя': 'Маша', 'рост': 160},
    {' рост ': 'Саша', ' рост ': 80},
    {'name': 'Паша'}]
heights = map(lambda x: x['рост'],
              filter(lambda x: 'рост' in x, people))
if len(heights) > 0:
    from operator import add
    average_height = reduce(add, heights) / len(heights)

Write declaratively, not imperatively

The following program emulates a race of three cars. At each point in time, the car either moves forward or not. Each time the program displays the path traveled by cars. After five intervals, the race ends.

Output Examples:

 -
 - -
 - -
 - -
 - -
 - - -
 - - -
 - -
 - - -
 - - - -
 - - -
 - - - -
 - - - -
 - - - -
 - - - - -

Program text:

from random import random
time = 5
car_positions = [1, 1, 1]
while time:
    # decrease time
    time -= 1
    print ''
    for i in range(len(car_positions)):
        # move car
        if random() > 0.3:
            car_positions[i] += 1
        # draw car
        print '-' * car_positions[i]

The code is imperative. A functional version would be declarative - it would describe what needs to be done, and not how it should be done.

We use functions

Declarability can be achieved by inserting code into functions:

from random import random
def move_cars():
    for i, _ in enumerate(car_positions):
        if random() > 0.3:
            car_positions[i] += 1
def draw_car(car_position):
    print '-' * car_position
def run_step_of_race():
    global time
    time -= 1
    move_cars()
def draw():
    print ''
    for car_position in car_positions:
        draw_car(car_position)
time = 5
car_positions = [1, 1, 1]
while time:
    run_step_of_race()
    draw()

To understand the program, the reader looks through the main loop. “If there is time left, we will go through one step of the race and display the result. Check the time again. ” If the reader needs to understand how the race step works, he will be able to read his code separately.

Comments are not needed, the code explains itself.

Dividing the code into functions makes the code more readable. This technique uses functions, but only as routines. They pack code, but do not make it functional. Functions affect the code that surrounds them and change global variables, rather than returning values. If the reader encounters a variable, he will need to find where it came from.

Here is the functional version of this program:

from random import random
def move_cars(car_positions):
    return map(lambda x: x + 1 if random() > 0.3 else x,
               car_positions)
def output_car(car_position):
    return '-' * car_position
def run_step_of_race(state):
    return {'time': state['time'] - 1,
            'car_positions': move_cars(state['car_positions'])}
def draw(state):
    print ''
    print '\n'.join(map(output_car, state['car_positions']))
def race(state):
    draw(state)
    if state['time']:
        race(run_step_of_race(state))
race({'time': 5,
      'car_positions': [1, 1, 1]})

Now the code is divided into functional functions. There are three signs of this. The first - there are no shared variables. time and car_positions are passed directly to race (). The second is that functions take parameters. Third - variables do not change inside functions, all values are returned. Each time run_step_of_race () takes the next step, it is passed again to the next.

Here are two functions zero () and one ():

def zero(s):
    if s[0] == "0":
        return s[1:]
def one(s):
    if s[0] == "1":
        return s[1:]

zero () takes a string s. If the first character is 0, then returns the rest of the string. If not, then None. one () does the same if the first character is 1.

Imagine the rule_sequence () function. It takes a string and a list of rule functions, consisting of zero and one functions. It invokes the first rule, passing it a string. If None is not returned, then it takes the returned value and calls the next rule. Etc. If None is returned, rule_sequence () stops and returns None. Otherwise, the meaning of the last rule.

Examples of input and output data:

print rule_sequence('0101', [zero, one, zero])
# => 1
print rule_sequence('0101', [zero, zero])
# => None

Imperative version of rule_sequence ():

def rule_sequence(s, rules):
    for rule in rules:
        s = rule(s)
        if s == None:
            break
    return s

Exercise 3 . This code uses a loop. Rewrite it declaratively using recursion.

My decision:

def rule_sequence(s, rules):
    if s == None or not rules:
        return s
    else:
        return rule_sequence(rules[0](s), rules[1:])

Use pipelines

Now we rewrite the other kind of cycles using a technique called a pipeline.

The next cycle changes the dictionaries containing the name, the wrong country of origin and the status of some groups.

bands = [{'name': 'sunset rubdown', 'country': 'UK', 'active': False},
         {'name': 'women', 'country': 'Germany', 'active': False},
         {'name': 'a silver mt. zion', 'country': 'Spain', 'active': True}]
def format_bands(bands):
    for band in bands:
        band['country'] = 'Canada'
        band['name'] = band['name'].replace('.', '')
        band['name'] = band['name'].title()
format_bands(bands)
print bands
# => [{'name': 'Sunset Rubdown', 'active': False, 'country': 'Canada'},
#     {'name': 'Women', 'active': False, 'country': 'Canada' },
#     {'name': 'A Silver Mt Zion', 'active': True, 'country': 'Canada'}]

The name of the function “format” is too general. In general, the code is a little worrisome. Three different things happen in one cycle. The value of the 'country' key changes to 'Canada'. The dots are removed and the first letter of the name is changed to capital. It’s hard to understand what code should do, and it’s hard to say if it does. It is hard to use, test and parallelize.

Compare:

print pipeline_each(bands, [set_canada_as_country,
                            strip_punctuation_from_name,
                            capitalize_names])

Everything is simple. Helper functions look functional because they are chained. Exit previous - enter the next. They are easy to verify, reuse, validate and parallelize.

pipeline_each () iterates over the groups one at a time, and passes them to the transformation functions, like set_canada_as_country (). After applying the function to all groups, pipeline_each () makes a list of them and passes it to the next.

Let's look at the conversion functions.

def assoc(_d, key, value):
    from copy import deepcopy
    d = deepcopy(_d)
    d[key] = value
    return d
def set_canada_as_country(band):
    return assoc(band, 'country', "Canada")
def strip_punctuation_from_name(band):
    return assoc(band, 'name', band['name'].replace('.', ''))
def capitalize_names(band):
    return assoc(band, 'name', band['name'].title())

Each associates a group key with a new value. Without changing the original data, this is hard to do, so we solve it with assoc (). It uses deepcopy () to create a copy of the passed dictionary. Each function converts a copy and returns that copy.

Everything seems to be normal. Original data protected from change. But there are two potential places for data changes in the code. In strip_punctuation_from_name (), a dotless name is created by calling calling replace () with the original name. In capitalize_names (), a name is created with the first capital letter based on title () and the original name. If replace and time are not functional, then strip_punctuation_from_name () with capitalize_names () are not functional.

Fortunately, they are functional. In Python, strings are immutable. These functions work with string copies. Uff, thank god.

This contrast between strings and dictionaries (their mutability) in Python demonstrates the benefits of languages like Clojure. There, the programmer does not need to think if he will change the data. It will not change.

Exercise 4 . Try to make the pipeline_each function. Think about the sequence of operations. Groups - in an array, are passed one at a time for the first conversion function. Then the resulting array is passed one little thing for the second function, and so on.

My decision:

def pipeline_each(data, fns):
    return reduce(lambda a, x: map(x, a),
                  fns,
                  data)

All three conversion functions as a result change the specific field of the group. call () can be used to create an abstraction for this. It accepts the function and the key to which it will be applied.

set_canada_as_country = call(lambda x: 'Canada', 'country')
strip_punctuation_from_name = call(lambda x: x.replace('.', ''), 'name')
capitalize_names = call(str.title, 'name')
print pipeline_each(bands, [set_canada_as_country,
                            strip_punctuation_from_name,
                            capitalize_names])

Or, sacrificing readability:

print pipeline_each(bands, [call(lambda x: 'Canada', 'country'),
                            call(lambda x: x.replace('.', ''), 'name'),
                            call(str.title, 'name')])

Code for call ():

def assoc(_d, key, value):
    from copy import deepcopy
    d = deepcopy(_d)
    d[key] = value
    return d
def call(fn, key):
    def apply_fn(record):
        return assoc(record, key, fn(record.get(key)))
    return apply_fn

What is going on here?

One. call is a higher order function, because takes another function as an argument and returns a function.

Two. apply_fn () is similar to conversion functions. Gets the entry (group). Searches for the value of record [key]. Calls fn. Assigns the result to a copy of the record and returns it.

Three. call does nothing itself. All work is done by apply_fn (). In the pipeline_each () example, one instance of apply_fn () sets 'country' to 'Canada'. Another - capitalizes the first letter.

Four. When the apply_fn () instance is executed, the fn and key functions will not be available in scope. These are not apply_fn () arguments or local variables. But access to them will be. When defining a function, it saves references to variables that it closes — those that were defined outside the function and are used internally. When the function starts, variables are searched among local, then among arguments, and then among references to closed ones. There are fn and key.

Five. There is no mention of groups in call. This is because call can be used to create any pipelines, regardless of their contents. Functional programming, in particular, builds a library of common functions suitable for compositions and for reuse.

Well done. Closures, higher order functions, and scope are all in a few paragraphs. You can also drink a tea with cookies.

There remains one more processing of these groups. Remove everything except the name and country. Extract_name_and_country () function:

def extract_name_and_country(band):
    plucked_band = {}
    plucked_band['name'] = band['name']
    plucked_band['country'] = band['country']
    return plucked_band
print pipeline_each(bands, [call(lambda x: 'Canada', 'country'),
                            call(lambda x: x.replace('.', ''), 'name'),
                            call(str.title, 'name'),
                            extract_name_and_country])
# => [{'name': 'Sunset Rubdown', 'country': 'Canada'},
#     {'name': 'Women', 'country': 'Canada'},
#     {'name': 'A Silver Mt Zion', 'country': 'Canada'}]

extract_name_and_country () could be written in a generalized form called pluck (). It would be used like this:

print pipeline_each(bands, [call(lambda x: 'Canada', 'country'),
                            call(lambda x: x.replace('.', ''), 'name'),
                            call(str.title, 'name'),
                            pluck(['name', 'country'])])

Exercise 5 . pluck accepts a list of keys to extract from the records. Try to write it. This is a higher order function.

My decision:

def pluck(keys):
    def pluck_fn(record):
        return reduce(lambda a, x: assoc(a, x, record[x]),
                      keys,
                      {})
    return pluck_fn

And now what?

Functional code goes well with traditional. Conversions from this article can be used in any language. Try it and you for your code.

Remember about Masha, Petya and Vasya. Turn list iterations into map and reduces.

Remember the race. Break the code into functions, and make them functional. Turn the loop into recursion.

Remember the groups. Turn a sequence of operations into a pipeline.

Tags:

Introduction to functional Python programming

Map

Reduce

Why is map and reduce better?

Write declaratively, not imperatively

We use functions

Use pipelines

Also popular now: