SLY_G October 21, 2014 at 21:22

Expressive JavaScript: Higher Order Functions

Transfer

Content

Tsu-li and Tsu-su boasted the size of their new programs. “Two hundred thousand lines,” said Tsu-li, ““ not counting the comments! ”Tsu-su replied:“ Pf-f, mine is almost a million lines. ” Master Yun-Ma said: "My best program takes five hundred lines." Hearing this, Tsu-li and Tsu-su experienced enlightenment.

Yun-Ma Master, Programming Book

There are two ways to build programs: to make them so simple that there will obviously be no errors, or so complex that there will be no obvious errors.

Anthony Hoar, 1980 Turing Lecture

A large program is a costly program, and not just because of the time it was written. Large sizes usually mean complexity, and complexity confuses programmers. Confused programmers make mistakes in programs. A large program means that bugs have a place to hide and are more difficult to find.

Let us return briefly to two examples from the introduction. The first is self-sufficient, and takes six lines.

var total = 0, count = 1;
while (count <= 10) {
  total += count;
  count += 1;
}
console.log(total);

The second is based on two external functions and occupies one line.

console.log(sum(range(1, 10)));

Which of them is more likely to encounter a mistake?

If we add the size of the sum and range definitions, the second program will also be large - larger than the first. But I still claim that it is likely to be correct.

This will be because the expression of the solution directly relates to the problem being solved. Summation of a numerical interval is not cycles and counters. These are amounts and gaps.

The definitions of this dictionary (sum and range functions) will include loops, counters, and other random details. But because they express simpler concepts than the whole program, they are easier to do correctly.

Abstraction

In a programmatic context, these “dictionary” definitions are often called abstractions. Abstractions hide details and give us the opportunity to talk about tasks at a higher, or more abstract, level.

Compare two recipes for pea soup:

Add one cup of dried peas per serving to the bowl. Add water so that it covers the peas. Leave it like that for at least 12 hours. Remove the peas from the water and place them in the pan. Add 4 cups of water per serving. Close the pan and simmer the peas for two hours. Take half the onion per serving. Cut into pieces with a knife, add to the peas. Take one stalk of celery per serving. Cut into pieces with a knife, add to the peas. Take a carrot per serving. Cut into pieces with a knife, add to the peas. Cook another 10 minutes.

The second recipe:

Per serving: 1 cup dry peas, half an onion, a celery stalk, carrot.
Soak the peas for 12 hours. Stew for 2 hours in 4 cups of water per serving. Cut and add vegetables. Cook another 10 minutes.

The second is shorter and simpler. But you need to know a little more concepts related to cooking - soaking, stewing, cutting (and vegetables).

When programming, we cannot rely on the fact that all the necessary words will be in our dictionary. Because of this, you can slip into the pattern of the first recipe: dictate to the computer all the small steps one after another, not noticing the concepts of a higher level that they express.

The second nature of the programmer should be the ability to notice when a concept begs to come up with a new word for it and put it into abstraction.

Abstract array traversal

The simple functions we used before are good for building abstractions. But sometimes they are not enough.

In the previous chapter, we saw this cycle several times:

var array = [1, 2, 3];
for (var i = 0; i < array.length; i++) {
  var current = array[i];
  console.log(current);
}

The code tries to say: "for each element in the array, print it to the console." But it uses a workaround - with a variable for counting i, checking the length of the array, and declaring an additional variable, current. Not only is he not very handsome, he is also the basis for potential mistakes. We can accidentally reuse the variable i, instead of length write lenght, confuse the variables i and current, etc.

Let's abstract it into a function. Can you think of a way to do this?

It’s quite simple to write a function that traverses an array and calls for each element of console.log

function logEach(array) {
  for (var i = 0; i < array.length; i++)
    console.log(array[i]);
}

But what if we need to do something other than display elements in the console? Since “doing something” can be represented as a function, and functions are just variables, we can pass this action as an argument:

function forEach(array, action) {
  for (var i = 0; i < array.length; i++)
    action(array[i]);
}
forEach(["Тили", "Мили", "Трямдия"], console.log);
// → Тили
// → Мили
// → Трямдия

Often you can not pass a predefined function to forEach, but create a function right on the spot.

var numbers = [1, 2, 3, 4, 5], sum = 0;
forEach(numbers, function(number) {
  sum += number;
});
console.log(sum);
// → 15

It looks like a classic for loop, with a loop body written in a block. However, now the body is inside the function, and also inside the forEach call brackets. Therefore, it must be closed with both curly and parentheses.

Using this template, we can set the variable name for the current array element (number), without having to manually select it from the array.

In general, we don’t even need to write forEach ourselves. This is the standard method of arrays. Since the array is already passed as the variable we are working on, forEach accepts only one argument - the function that needs to be executed for each element.

To demonstrate the convenience of this approach, let us return to the function from the previous chapter. It contains two cycles passing through arrays:

function gatherCorrelations(journal) {
  var phis = {};
  for (var entry = 0; entry < journal.length; entry++) {
    var events = journal[entry].events;
    for (var i = 0; i < events.length; i++) {
      var event = events[i];
      if (!(event in phis))
        phis[event] = phi(tableFor(event, journal));
    }
  }
  return phis;
}

Using forEach we record a little shorter and much cleaner.

function gatherCorrelations(journal) {
  var phis = {};
  journal.forEach(function(entry) {
    entry.events.forEach(function(event) {
      if (!(event in phis))
        phis[event] = phi(tableFor(event, journal));
    });
  });
  return phis;
}

Higher Order Functions

Functions that operate on other functions — either taking them as arguments or returning them — are called higher-order functions . If you already understood that functions are just variables, there is nothing special about the existence of such functions. The term comes from mathematics, where the differences between functions and other meanings are perceived more strictly.

Higher-order functions allow us to abstract actions, not just values. They are different. For example, you can make a function that creates new functions.

function greaterThan(n) {
  return function(m) { return m > n; };
}
var greaterThan10 = greaterThan(10);
console.log(greaterThan10(11));
// → true

You can make a function that changes other functions.

function noisy(f) {
  return function(arg) {
    console.log("calling with", arg);
    var val = f(arg);
    console.log("called with", arg, "- got", val);
    return val;
  };
}
noisy(Boolean)(0);
// → calling with 0
// → called with 0 - got false

You can even make functions that create new types of program flow control.

function unless(test, then) {
  if (!test) then();
}
function repeat(times, body) {
  for (var i = 0; i < times; i++) body(i);
}
repeat(3, function(n) {
  unless(n % 2, function() {
    console.log(n, "is even");
  });
});
// → 0 is even
// → 2 is even

The lexical scope rules that we discussed in chapter 3 work to our advantage in such cases. In the last example, the variable n is an argument to an external function. Since the inner function lives surrounded by the outer, it can use n. The bodies of such internal functions have access to the variables surrounding them. They can play the role of blocks {} used in regular loops and conditional expressions. An important difference is that variables declared inside internal functions do not fall into the external environment. And usually this is only for the best.

Passing arguments

The noisy function, declared earlier, which passes its argument to another function, is not entirely convenient.

function noisy(f) {
  return function(arg) {
    console.log("calling with", arg);
    var val = f(arg);
    console.log("called with", arg, "- got", val);
    return val;
  };
}

If f takes more than one parameter, it will receive only the first. It would be possible to add a bunch of arguments to the internal function (arg1, arg2, etc.) and pass all of them to f, but it is not known how much is enough for us. In addition, the function f could not work correctly with arguments.length. Since we would always pass the same number of arguments, it would not be known how many arguments were given to us initially.

For such cases, functions in JavaScript have an apply method. An array (or an object in the form of an array) of arguments is passed to him, and he calls a function with these arguments.

function transparentWrapping(f) {
  return function() {
    return f.apply(null, arguments);
  };
}

This function is useless, but it demonstrates the template that interests us - the function returned by it passes to f all the arguments it received, but nothing more. This happens by passing its own arguments stored in the arguments object to the apply method. The first argument to the apply method, which we set to null in this case, can be used to emulate a method call. We will return to this issue in the next chapter.

Json

Higher-order functions that somehow apply the function to array elements are widespread in JavaScript. The forEach method is one of the most primitive of these functions. As array methods, we have access to many other options for functions. To get to know them, let's play with another dataset.

Some years ago, someone examined many archives and made a whole book on the history of my family name. I opened it, hoping to find knights, pirates and alchemists there ... But it turned out that it was filled mainly with Flemish farmers. For entertainment, I extracted information on my immediate ancestors and designed it in a format suitable for reading by a computer.

The file looks something like this:

[
  {"name": "Emma de Milliano", "sex": "f",
   "born": 1876, "died": 1956,
   "father": "Petrus de Milliano",
   "mother": "Sophia van Damme"},
  {"name": "Carolus Haverbeke", "sex": "m",
   "born": 1832, "died": 1905,
   "father": "Carel Haverbeke",
   "mother": "Maria van Brussel"},
  … и так далее
]

This format is called JSON, which means JavaScript Object Notation. It is widely used in data storage and network communications.

JSON is similar to JavaScript in the way of writing arrays and objects - with some limitations. All property names must be enclosed in double quotes, and only simple values are allowed - no function calls, variables, nothing that would include calculations. Comments are also not allowed.

JavaScript provides the JSON.stringify and JSON.parse functions that convert data from this format to this format. The first takes a value and returns a string with JSON. The second takes such a line and returns a value.

var string = JSON.stringify({name: "X", born: 1980});
console.log(string);
// → {"name":"X","born":1980}
console.log(JSON.parse(string).born);
// → 1980

The ANCESTRY_FILE variable is available here . It contains the JSON file as a string. Let's decode it and count the number of people mentioned.

var ancestry = JSON.parse(ANCESTRY_FILE);
console.log(ancestry.length);
// → 39

Filter the array

To find people who were young in 1924, the following function may come in handy. It filters out the elements of the array that fail the test.

function filter(array, test) {
  var passed = [];
  for (var i = 0; i < array.length; i++) {
    if (test(array[i]))
      passed.push(array[i]);
  }
  return passed;
}
console.log(filter(ancestry, function(person) {
  return person.born > 1900 && person.born < 1925;
}));
// → [{name: "Philibert Haverbeke", …}, …]

An argument called test is used - this is a function that performs verification calculations. It is called for each element, and the value returned by it determines whether this element falls into the returned array.

There were three people in the file who were young in 1924 - grandfather, grandmother and cousin.

Please note that the filter function does not delete elements from an existing array, but builds a new one containing only validated elements. This is a pure function because it does not spoil the array passed to it.

Like forEach, filter is one of the standard array methods. In the example, we described such a function, only to show what it does inside. From now on, we will use it simply:

onsole.log(ancestry.filter(function(person) {
  return person.father == "Carel Haverbeke";
}));
// → [{name: "Carolus Haverbeke", …}]

Transformations with map

Suppose we have an archive of objects representing people, which was obtained by filtering an array of ancestors. But we need an array of names that would be easier to read.

The map method transforms an array by applying a function to all its elements and constructing a new array from the returned values. The new array will have the same length as the input, but its contents will be converted to the new format.

function map(array, transform) {
  var mapped = [];
  for (var i = 0; i < array.length; i++)
    mapped.push(transform(array[i]));
  return mapped;
}
var overNinety = ancestry.filter(function(person) {
  return person.died - person.born > 90;
});
console.log(map(overNinety, function(person) {
  return person.name;
}));
// → ["Clara Aernoudts", "Emile Haverbeke",
//    "Maria Haverbeke"]

Interestingly, people who have lived at least 90 years old are the same that we saw earlier, who were young in the 1920s. This is just the newest generation in my records. Apparently, medicine has seriously improved.

Like forEach and filter, map is also a standard method for arrays.

Summation with reduce

Another popular example of working with arrays is to get a single value based on the data in the array. One example is the summation of a list of numbers already familiar to us. The other is the search for the person born before everyone else.

A higher-order operation of this type is called reduce (reduction; or sometimes fold, folding). You can imagine it as folding an array, one element at a time. When summing the numbers, we started from zero, and for each element we combined it with the current sum using addition.

The parameters of the reduce function, in addition to the array, are a combining function and an initial value. This function is a little less clear than filter or map, so pay close attention to it.

function reduce(array, combine, start) {
  var current = start;
  for (var i = 0; i < array.length; i++)
    current = combine(current, array[i]);
  return current;
}
console.log(reduce([1, 2, 3, 4], function(a, b) {
  return a + b;
}, 0));
// → 10

The standard reduce arrays method, which of course works the same way, is even more convenient. If the array contains at least one element, you can omit the start argument. The method will take the first element of the array as the starting value and begin work from the second.

In order to find the oldest of my ancestors known with reduce, we can write something like:

console.log(ancestry.reduce(function(min, cur) {
  if (cur.born < min.born) return cur;
  else return min;
}));
// → {name: "Pauwels van Haverbeke", born: 1535, …}

Composability

How could we write the previous example (searching for a person with the earliest date of birth) without higher-order functions? Actually, the code is not so terrible:

var min = ancestry[0];
for (var i = 1; i < ancestry.length; i++) {
  var cur = ancestry[i];
  if (cur.born < min.born)
    min = cur;
}
console.log(min);
// → {name: "Pauwels van Haverbeke", born: 1535, …}

Slightly more variables, two lines longer - but so far reasonably clear code.

Higher-order functions truly open up when you have to combine functions. For example, we write a code that finds the average age of men and women in a set.

function average(array) {
  function plus(a, b) { return a + b; }
  return array.reduce(plus) / array.length;
}
function age(p) { return p.died - p.born; }
function male(p) { return p.sex == "m"; }
function female(p) { return p.sex == "f"; }
console.log(average(ancestry.filter(male).map(age)));
// → 61.67
console.log(average(ancestry.filter(female).map(age)));
// → 54.56

(It’s silly that we have to define addition as a plus function, but the operators in JavaScript are not values, so you won’t pass them as arguments).

Instead of embedding the algorithm in a large cycle, everything is distributed according to the concepts that interest us - determining sex, counting age, and averaging numbers. We apply them in turn to get the result.

For comprehensible code, this is a fantastic opportunity. Of course, clarity does not come for free.

Price

In the happy land of elegant code and beautiful rainbows, there is a gadish monster named Inefficiency.

The program that processes the array is most beautifully represented as a sequence of clearly separated steps, each of which does something with the array and returns a new array. But layering all these intermediate arrays is expensive.

Similarly, passing a function to forEach so that it goes through the array for us is convenient and easy to understand. But calling functions in JavaScript is more expensive than loops.

This is also the case with many techniques that improve program readability. Abstractions add layers between the clean work of a computer and the concepts we work with - and as a result, the computer does more work. This is not an ironclad rule - there are languages that allow you to add abstractions without sacrificing efficiency, and even in JavaScript an experienced programmer can find ways to write abstract and fast code. But this problem is common.

Fortunately, most computers are insanely fast. If your data set is not too large, or the run time should be just fast enough from the point of view of a person (for example, to do something every time a user clicks on a button) - then it doesn’t matter if you wrote a beautiful solution that works half a millisecond, or very optimized, that works one tenth of a millisecond.

It is convenient to roughly calculate how often this piece of code will be called. If you have a loop in a loop (directly, or through a call in a loop of a function that internally also works with a loop), the code will be executed N * M times, where N is the number of repetitions of the outer loop and M is the inner one. If there is another cycle in the inner loop repeating P times, then we already get N * M * P - and so on. This can lead to large numbers, and when the program slows down, the problem can often be reduced to a small piece of code inside the innermost loop.

Great-great-great-great-great- ...

My grandfather, Philibert Haverbeke, is mentioned in the data file. Starting with him, I can track my family in search of the oldest of my ancestors, Powels van Haverbeck, my direct ancestor. Now I want to calculate what percentage of DNA I have from him (in theory).

To go from the name of the ancestor to the object representing it, we build an object that matches names and people.

var byName = {};
ancestry.forEach(function(person) {
  byName[person.name] = person;
});
console.log(byName["Philibert Haverbeke"]);
// → {name: "Philibert Haverbeke", …}

The task is not just to find a father in each of the records and calculate how many steps it takes to Powels. In family history there have been several marriages between cousins (well, small villages, etc.). In this regard, the branches of the family tree in some places are connected with others, so I get more genes than 1/2 G (G is the number of generations between Powels and me). This formula is based on the assumption that each generation splits the genetic fund in two.

It would be reasonable to draw an analogy with reduce, where the array is reduced to a single value by sequentially combining data from left to right. Here we also need to get a singular, but we must follow the lines of heredity. And they form not a simple list, but a tree.

We consider this value for a particular person, combining these values of his ancestors. This can be done recursively. If we need some kind of person, we need to calculate the necessary value for his parents, which in turn requires calculation for her ancestors, etc. In theory, we will have to go around an infinite number of tree nodes, but since our data set is finite, we will need to stop somewhere. We will simply set a default value for all people who are not on our list. It would be logical to assign them a zero value - people who are not on the list do not carry the DNA of the ancestor we need.

Accepting a person, a function for combining values from two ancestors and a default value, the reduceAncestors function “condenses” a value from a family tree.

function reduceAncestors(person, f, defaultValue) {
  function valueFor(person) {
    if (person == null)
      return defaultValue;
    else
      return f(person, valueFor(byName[person.mother]),
                       valueFor(byName[person.father]));
  }
  return valueFor(person);
}

The internal valueFor function works with one person. Thanks to recursive magic, she can call herself to process the father and mother of this person. The results, together with the person object, are passed to f, which calculates the desired value for this person.

Now we can use this to calculate the percentage of DNA that my grandfather shared with Powels baths Haverbeke and divide it into four.

function sharedDNA(person, fromMother, fromFather) {
  if (person.name == "Pauwels van Haverbeke")
    return 1;
  else
    return (fromMother + fromFather) / 2;
}
var ph = byName["Philibert Haverbeke"];
console.log(reduceAncestors(ph, sharedDNA, 0) / 4);
// → 0.00049

A person named Pauwels vann Haverbeke obviously shares 100% of the DNA with Pauwels vann Haverbeke (there are no full namesakes in the data list), so the function returns 1 for him. Everyone else shares the average percentage of their parents.

Statistically, I have about 0.05% of the DNA that matches my ancestor from the 16th century. This, of course, is an approximate number. This is pretty small, but since our genetic material is about 3 billion base pairs, I have something from my ancestor.

It would be possible to calculate this number without using reduceAncestors. But the separation of the general approach (bypassing the tree) and the specific case (DNA counting) allows us to write more understandable code and use parts of the code again for other tasks. For example, the following code finds out the percentage of known ancestors of a given person who lived to be 70 years old.

function countAncestors(person, test) {
  function combine(person, fromMother, fromFather) {
    var thisOneCounts = test(person);
    return fromMother + fromFather + (thisOneCounts ? 1 : 0);
  }
  return reduceAncestors(person, combine, 0);
}
function longLivingPercentage(person) {
  var all = countAncestors(person, function(person) {
    return true;
  });
  var longLiving = countAncestors(person, function(person) {
    return (person.died - person.born) >= 70;
  });
  return longLiving / all;
}
console.log(longLivingPercentage(byName["Emile Haverbeke"]));
// → 0.145

No need to take such calculations too seriously, since our set contains an arbitrary sample of people. But the code demonstrates that reduceAncestors is a useful part of a general vocabulary for working with a data structure such as a family tree.

Binding

The bind method, which all functions have, creates a new function that will call the original, but with some fixed arguments.

The following example shows how this works. In it, we define the function isInSet, which says whether there is a person’s name in a given set. To call filter, we can either write an expression with a function that calls isInSet, passing it a set of strings as the first argument, or partially use the isInSet function.

var theSet = ["Carel Haverbeke", "Maria van Brussel",
              "Donald Duck"];
function isInSet(set, person) {
  return set.indexOf(person.name) > -1;
}
console.log(ancestry.filter(function(person) {
  return isInSet(theSet, person);
}));
// → [{name: "Maria van Brussel", …},
//    {name: "Carel Haverbeke", …}]
console.log(ancestry.filter(isInSet.bind(null, theSet)));
// → … same result

The bind call returns a function that calls isInSet with the first argument to theSet, and subsequent arguments the same as those passed to bind.

The first argument, which is now set to null, is used for method calls - just as it was in apply. We will talk about it later.

Total

The ability to pass a function call to other functions is not just a toy, but a very useful JavaScript property. We can write expressions with spaces in them, which will then be filled in using the values returned by the functions.

Arrays have several useful higher-order methods - forEach to do something with each element, filter - to build a new array where some values are filtered, map - to build a new array, each element of which is passed through a function, reduce - for combination all elements of the array into one value.

Functions have an apply method to pass arguments to them as an array. They also have a bind method for creating a copy of a function with partially specified arguments.

Exercises

Convolution

Use the reduce method in combination with concat to collapse an array of arrays into a single array that has all the elements of the input arrays.

var arrays = [[1, 2, 3], [4, 5], [6]];
// Ваш код тут
// → [1, 2, 3, 4, 5, 6]

Age difference between mothers and their children

Using the data set from the example, calculate the average age difference between mothers and their children (this is the age of the mother at the time the baby was born). You can use the average function in the chapter.

Please note - not all mothers mentioned in the kit are present in it. The byName object may come in handy, which simplifies the process of finding a person by name.

function average(array) {
  function plus(a, b) { return a + b; }
  return array.reduce(plus) / array.length;
}
var byName = {};
ancestry.forEach(function(person) {
  byName[person.name] = person;
});
// Ваш код тут
// → 31.2

Historical Life Expectancy

We believed that only the last generation of people lived to be 90 years old. Let's look at this phenomenon in more detail. Calculate the average age of people for each of the centuries. Assign to a century of people, taking their year of death, dividing it by 100 and rounding: Math.ceil (person.died / 100).

function average(array) {
  function plus(a, b) { return a + b; }
  return array.reduce(plus) / array.length;
}
// Тут ваш код
// → 16: 43.5
//   17: 51.2
//   18: 52.8
//   19: 54.8
//   20: 84.7
//   21: 94

As a bonus game, write a groupBy function that abstracts the grouping operation. It should take an array and a function that counts the group for the elements of the array, and return an object that maps the names of the groups to the arrays of members of these groups.

Every and some

Arrays have standard methods every and some. They take as an argument a certain function that, when called with an array element as an argument, returns true or false. Just as && returns true only if the expressions on both sides of the operator return true, the every method returns true when the function returns true for all elements of the array. Accordingly, some returns true when the given function returns true when working with any of the elements of the array. They do not process more elements than necessary - for example, if some gets true for the first element, it does not process the remaining.

Write every and some functions that work just like these methods, only accept an array as an argument.

// Ваш код тут
console.log(every([NaN, NaN, NaN], isNaN));
// → true
console.log(every([NaN, NaN, 4], isNaN));
// → false
console.log(some([NaN, 3, 4], isNaN));
// → true
console.log(some([2, 3, 4], isNaN));
// → false

Tags: