Pozadi September 23, 2014 at 18:00

Transducers in JavaScript. Part two

In the first part, we focused on the following specification: A transducer is a function that takes a function stepand returns a new function step.

step⁰ → step¹

The function step, in turn, takes the current result and the next element, and should return a new current result. Moreover, the data type of the current result is not specified.

result⁰, item → result¹

To get a new current result in a function step¹, you need to call the function step⁰, passing in it the old current result and the new value that we want to add. If we do not want to add a value, then we simply return the old result. If we want to add one value, then we call step⁰, and what it returns is returned as a new result. If we want to add several values, then we call it step⁰several times along the chain, it is easier to show on the example of the flatten transducer implementation :

function flatten() {
  return function(step) {
    return function(result, item) {
      for (var i = 0; i < item.length; i++) {
        result = step(result, item[i]);
      }
      return result;
    }
  }
}
var flattenT = flatten();
_.reduce([[1, 2], [], [3]], flattenT(append), []); // => [1, 2, 3]

Those. you need to call it stepseveral times, each time saving the current result to a variable, and passing it on the next call, and at the end return the final one.

As a result, it turns out that when processing each element, one function stepcalls another, and the next one, and so on until the last utility function step, which already saves the result to the collection ( appendfrom the first part).

So now we can:

Change elements (note map)
Skip items (approx. Filter)
Issue several new ones for one element (approx. Flatten)

Premature termination

But what if we want to interrupt the whole process in the middle? Those. implement take , for example. To do this, Rich suggests wrapping the return value in a special “reduced” wrapper.

function Reduced(wrapped) {
  this._wrapped = wrapped;
}
Reduced.prototype.unwrap = function() {
  return this._wrapped;
}
Reduced.isReduced = function(obj) {
  return (obj instanceof Reduced);
}
function take(n) {
  return function(step) {
    var count = 0;
    return function(result, item) {
      if (count++ < n) {
        return step(result, item);
      } else {
        return new Reduced(result);
      }
    }
  }
}
var first5T = take(5);

If we want to complete the process, then, instead of returning the next one resultas usual, we return it result, wrapped in Reduced. Immediately update the signature of the step function:

result⁰, item → result¹ | reduced(result¹)

But the function will _.reduceno longer be able to handle such a version of transducers. Have to write a new one.

function reduce(coll, fn, seed) {
  var result = seed;
  for (var i = 0; i < coll.length; i++) {
    result = fn(result, coll[i]);
    if (Reduced.isReduced(result)) {
      return result.unwrap();
    }
  }
  return result;
}

Now you can apply the transducer first5T.

reduce([1, 2, 3, 4, 5, 6, 7], first5T(append), []);    // => [1, 2, 3, 4, 5]

Still it is necessary to add check Reduced.isReduced(result)in transducers which cause step several times (approx. Flatten). Those. if in flatten with the step call again we are returned the result wrapped in Reduced, we must complete our cycle and return this wrapped result.

condition

Another important detail, the take transducer has a state. He remembers how many elements have already passed through him. For everything to work correctly, this counter must be created exactly in the place where it was created in the example (see var count), i.e. inside the function that returns step. If it were, for example, a global variable, then we would count the elements for all transducers of type take in one counter, and we would get the wrong result.

Let's create another utility function for launching transducers in order to clearly show the moment where the state is created.

function transduce(transducer, append, seed, coll) {
  var step = transducer(append);  // В момент вызова этой функции создаются состояния.
                                  // step содержит в себе счетчик,
                                  // и его (step) следует использовать только в рамках
                                  // этого цикла обработки коллекции, после чего уничтожить.
  return reduce(coll, step, seed);
}
transduce(first5T, append, [], [1, 2, 3, 4, 5, 6, 7]);    // => [1, 2, 3, 4, 5]

Completion

We have already talked about premature termination, but there may be a normal termination when the original collection simply ends. Some transducers may somehow handle completion.

For example, we want to break a collection into small collections of a given length, but if there aren’t enough elements for the last small collection, then simply return an incomplete one. It is necessary to somehow understand that there will be no more elements, and return what is.

So that Rich can do this, he suggests adding another version of the step function, to which the next value is not passed, but only the current result is transferred. This option will be called at the end of collection processing if there was no premature termination.

In clojure, these two functions are combined into one; we can do this in JavaScript too.

function step(result, item) {
  if (arguments.length === 2) { // обычный вызов
    // возвращаем step(result, item) или что вам нужно
  }
  if (arguments.length === 1) { // завершительный вызов
    // Здесь необходимо вызвать step c одним аргументом, чтобы передать завершающий сигнал дальше.
    // Но если мы хотим что-то добавить в коллекцию в конце,
    // то мы должны сначала вызвать step с двумя аргументами, а потом с одним.
    // ничего не добавляем
    return step(result);
    // что-то добавляем
    result = step(result, что-то);
    return step(result);
  }
}

Update the signature of the step function, now it has two options, depending on the number of arguments:

result⁰ → result¹ *
result⁰, item → result¹ | reduced(result¹)
* я не уверен может ли здесь возвращаться reduced(result¹), из выступления Рича это не ясно. Будем пока считать что не может.

All transducers must support both operations - the usual step and the final call. Function well transduce()and append()need to update, adding support for the finishing of the call.

function transduce(transducer, append, seed, coll) {
  var step = transducer(append);
  var result = reduce(coll, step, seed);
  return step(result);
}
function append(result, item) {
  if (arguments.length === 2) {
    return result.concat([item]);
  }
  if (arguments.length === 1) {
    return result;
  }
}

So here is the partition implementation (breaks the collection into small collections):

function partition(n) {
  if (n < 1) {
    throw new Error('n должен быть не меньше 1');
  }
  return function(step) {
    var cur = [];
    return function(result, item) {
      if (arguments.length === 2) {
        cur.push(item);
        if (cur.length === n) {
          result = step(result, cur);
          cur = [];
          return result;
        } else {
          return result;
        }
      }
      if (arguments.length === 1) {
        if (cur.length > 0) {
          result = step(result, cur);
        }
        return step(result);
      }
    }
  }
}
var by3ItemsT = partition(3);
transduce(by3ItemsT, append, [], [1,2,3,4,5,6,7,8]);   // => [[1,2,3], [4,5,6], [7,8]]

Initialization

Rich also suggests adding the ability for transducers to create an initial empty result value. (We everywhere used an empty array for these purposes, which we explicitly passed first to reduce, and then to transduce.)

To do this, we need to add another variant of the step function - without any parameters at all. If step is called without parameters, it should return the initial value, for example, an empty array.

Obviously, transducers cannot create an empty array, since they are not tied to the type of collection being processed. But besides the step function in transducers, there is also an external function step, which, just, knows about the type of collection. In our examples, this is the append function.

Update the function signature step.

→ result
result⁰ → result¹
result⁰, item → result¹ | reduced(result¹)

Update functions transduce()andappend()

function transduce(transducer, append, coll) {
  var step = transducer(append);
  var seed = step();
  var result = reduce(coll, step, seed);
  return step(result);
}
function append(result, item) {
  if (arguments.length === 2) {
    return result.concat([item]);
  }
  if (arguments.length === 1) {
    return result;
  }
  if (arguments.length === 0) {
    return [];
  }
}

And for example, rewrite the map transducer generator.

function map(fn) {
  return function(step) {
    return function(result, item) {
      if (arguments.length === 2) {
        return step(result, fn(item));
      }
      if (arguments.length === 1) {
        return step(result);
      }
      if (arguments.length === 0) {
        return step();
      }
    }
  }
}

It turns out that we just moved an empty array from the parameter transduce()inside append(), at first glance this is an unnecessary action, but it gave us the opportunity to create transducers that add something to the beginning of the collection (like those that add to the end, just the opposite).

Thus, all transducers must support three operations in the step function — the normal step, the final call, and the initial call. But most of them will simply transfer the initiative to the next transducer in the last two cases.

Summary

That's all. I retold the entire report of Rich Hickey . And, as I understand it, this is still all that can be said about transducers.

To summarize again what we got. We got a universal way to create operations on collections. These operations can: change elements (map), skip elements (filter), propagate elements (flatten), have a state (take, partition), prematurely complete processing (take), add something at the end (partition) and add something- then first. We can easily combine all these operations using compose, and use them both on regular collections, such as in FRP. In addition, all this will work quickly and consume little memory, because no temporary collections are created.

This is all cool! But how do we start using them? The problem is that in order to use transducers to the maximum, the JavaScript community must agree on the specification (and we can, yes? :-). Then a cool scenario could be realized in which libraries for working with collections (underscore, etc.) would be able to create transducers, and other libraries that are not entirely about collections (e.g. FRP) would simply support transducers.

The specs that Rich offers at first glance are pretty good for JavaScript, with the exception of the Reduced details. The fact is that Clojure already has global Reduced (it's been there for a long time), but not in JavaScript. It is, of course, easy to create, but each library will create its own Reduced. As a result, if, for example, I want to add support for transducers in Kefir.js, I will have to add support for transducers-underscore, transducers-LoDash, etc. Reduced is a weak point in the specifications offered by Rich.

Another scenario is the emergence of different libraries about transducers, each of which will have its own specification. Then we can get only part of the benefits. There is already a transducers.js library, it certainly has created its own Reduced, and there is no support for the final and initial calls, and it is not known in what form the author will add them.

Well, given the fact that many transducers do not seem to be something new and very useful, it is not yet clear how we will be, or whether we will use them in JavaScript.

Tags: