GRaAL March 23, 2015 at 12:32

Pattern-matching (another) in coffeescript

From the sandbox

Introduction

Once I sat and looked sadly at the Erlang code written as part of the study. I really wanted to write something more useful on it than tic-tac-toe, but as luck would have it never occurred to me. But there is JavaScript in which there are first-order functions, and currying, and map / filter / fold, and, most importantly, the task is much easier to come up with. But there is no pattern matching. A quick search gave me several libraries, but the syntax they proposed seemed to me heavy. Is it possible to make more concise, closer to the native Erlang syntax?

Spoiler: you can, if you take coffeescript, do this:

fn = Match -> [
  When {command: “draw”, figure: @figure = {type: “circle”, radius: @radius}}, -> 
    console.log(@figure, @radius)
  When {command: “draw”, figure: @figure = {type: “polygon”, points: [@first, @second | @rest]}}, -> 
    console.log(@figure, @first, @second, @rest);
]
fn {command: “draw”, figure: {type: “circle”, radius: 5, color: “red”}}
#output: {type: “circle”, radius: 5, color: “red”} 5

Who cares how this happened - welcome to cat.

Short description

Actually, what happens here, in short:

Match takes a function that returns an array of patterns and their corresponding actions;
When called, the context is replaced, so that all these @a (== this.a) point to specially prepared properties (properties), allowing the parser to understand which values to bind to;
Then, when calling with a specific value, a comparison is made with a template (for now, we can assume that there is a recursive traversal of the template and the value and comparison of specific values, a little more will be given more on this);
If the value matches the pattern, the action function is called. In her, we will also change the context, substituting the appropriate values.

So if we take the example above, then @radius in the first When argument indicates which part of the input object should be removed (in this case .figure.radius), and in the second argument (function) it also contains a specific value.

Work with arrays

Erlang has a convenient syntax for splitting a list into a head (the first element) and a tail (everything else), which is widely used for various recursive algorithms.

case List of
    [Head | Tail] -> Head;
    [] -> {empty};
end.

In javascript (and coffeescript), there is no way to override operators, so using regular means you can only do something like:

When [@head, @tail…], -> console.log(@head, @tail)

In principle, not bad, but in erlang it is somehow prettier. Maybe somehow it is possible, at least for a number of scenarios?
Here it is worth remembering how generally javascript manages to perform an operation like:

var object1 = {x:1}, object2 = {y: 2};
console.log(object1 | object2);

And get 0. This works because javascript first tries to cast the type to numeric, which calls the valueOf method on the object. If we substitute the method for our objects and return the power of two, as a result, we can find out which objects the operation was applied to.

var object1 = {x:1}, object2 = {y: 2}, object3 = {z: 3};
object1.valueOf = function() { return 2; }
object2.valueOf = function() { return 4; }
object3.valueOf = function() { return 8; }
console.log(object1 | object2); 
//6 == 2 | 4 == object1 and object2

A bold assumption was made that it is very rare that anyone will use arrays of specific numbers in templates, so if the parser encounters a number at the end of the array, it tries to determine if this is the result of the or operation of two objects. As a result, it became possible to write like this:

When [@head | @tail], -> console.log(@head, @tail)

It looks nice, but outside of this task, I would not use this method everywhere.

Class mapping

The next thing I wanted was to do a structure mapping as in Erlang, indicating the type and content.

#person{name = Name}

Of course, it will not succeed directly, but something similar can be done. In the end, I settled on three solutions:

When ObjectOf(Point1, {x: 1, y: 2}), -> …
When Point2(x:1, y:2), -> …
When Point3$(x:1, y:2), -> ...

The first works out of the box, the second looks almost like a case class on scala, but requires the insertion of such a line into the constructor.

class Point2
  constructor: (@x, @y) ->
    return m if m = ObjectOf(@, Point2, arguments)

This is needed to understand whether the function was called as a constructor or not, the constructor and arguments fall into the template.

The third option is a variation on the theme of the first, just pre-provision the function:

Point3$ = ObjectOf(Point3)

Performance

The first naive version performed a comparison of a template and a value, passing them recursively. In principle, I expected that performance would not be up to par when compared to a simple parsing of an object. But it was worth checking.

Manual parsing

coffeeDestruct = (demo) ->
 {user} = demo
 return if not user.enabled
 {firstname, group, mailbox, settings} = user
 return if group.id != "admin"
 notifications = settings?.mail?.notify ? []
 return if mailbox?.kind != 'personal'
 mailboxId = mailbox?.id ? null
 {unreadmails, readmails} = mailbox;
 return if unreadmails.length < 1
 firstUnread = unreadmails?[0] ? []
 restUnread = unreadmails?.slice(1) ? []
 return if readmails?.length < 1
 return if readmails?[0]?.subject != "Hello"
 rest = readmails?.slice(1)
 return {firstname, notifications, firstUnread, restUnread, rest, mailboxId}

Template

singlePattern = Match -> [
 When {user: {
   firstname: @firstname,
   enabled: true,
   group: {id: "admin"},
   settings: {mail: {notify: @notifications}},
   mailbox: {
     id: @mailboxId,
     kind: "personal",
     unreadmails: [
       @firstUnread | @restUnread
     ],
     readmails: [
       {subject: "Hello"}, Tail(@rest)
     ]
   }
 }}, -> "ok"
]

Results for 10,000 comparisons:

Regular: 5ms
Single Pattern: 140ms
Multiple patterns: 429ms

Yes, not what I want to see in production. Why not convert the template to code close to the first example?

No sooner said than done. We follow the template and make a list of conditions and intermediate variables.

An interesting feature came out here. The first version of the compiled function was almost identical to handwritten parsing, but productivity was worse than one and a half times. The difference was in the way the resulting object was created: it turned out that creating an object with the given fields was cheaper than creating an empty object and filling it in later on. To check, I made this benchmark . Then found two articles on this topic - here and here - and also the transfer to Habré .

Conducted optimization, run:

Regular: 5ms
Single Pattern: 8ms
Multiple patterns: 164ms

The second number looks good, but what is the third and why is it still large? The third is a match expression with multiple patterns where only the last one fires. Since templates are compiled independently, we get a linear dependence on the number of templates.

But in reality, the templates will be very similar - we will disassemble objects that differ in some details, and at the same time having a similar structure. Let's say here:

fn = Match -> [
 When ["wait", "infinity"], -> console.log("wait forever")
 When ["wait", @time = Number], -> console.log("wait for " + this.time + "s")
]

In both cases, the array consists of two elements and the first is “wait”, the differences are only in the second. And the parser will make two almost identical functions and will call them one by one. Let's try to combine them.

The meaning is simple:

The parser, instead of the code, will issue chains of "commands";
Then all the teams are taken and assembled in one chain with branches;
Now the teams are turning into code.

It is worth noting that if we went into one chain, then in case of failure we should not go out, but try the next chain. I saw three ways to achieve this:

1. Nested conditions

if (Array.isArray(val)) {
   if (val.length === 2) {
      if (val[0] === ‘wait’) {
         if (val[1] === ‘infinity’) {
            return {when: 0};
          }
         if (val[1].constructor.name === ‘Number’) {
          return {when: 1};
         }
      }
   }
}

It looks awful, and even when generating the code, you would not get confused in brackets. Not.

2. Nested Functions

if (!Array.isArray(val)) return false;
if (val.length !== 2) return false;
if (val[0] !== ‘wait’) return false;
if (res = function fork1() {
    if (val[1] !== ‘infinity’) return false;
    return {when: 0}
}()) return res;
if (res = function fork2() {
    if (val[1].constructor.name !== ‘Number’) return false;
    return {when: 1};
}()) return res;

It looks better. But additional checks and returns are straining, because there is no way to return immediately from an external function (well, except perhaps through exceptions, but this is not serious).

3. Breaking ~~bad~~ label

if (!Array.isArray(val)) return false;
if (val.length !== 2) return false;
if (val[0] !== ‘wait’) return false;
fork1: {
    if (val[1] !== ‘infinity’) break fork1;
    return {when: 0}
}
fork2: {
    if (val[1].constructor.name !== ‘Number’) break fork2;
    return {when: 1};
}

It looks good and it seemed to me as an old opener that this option would be faster. A quick jsperf check confirmed my hunch. So we will stop on this option.

Let's look at the performance:

Regular: 5ms
Single Pattern: 8ms
Multiple patterns: 12ms

It is quite acceptable. Leave as is.

Architecture and plugins

After adding another functionality by adding new ifs in two different places, I decided to redesign the architecture. Now, instead of two large functions, parse and render, there are smaller parse and render functions that themselves do not really do anything, but each part of the template is sent through a chain of plugins. Each plugin can:

process your piece of the template and turn it into a set of commands;
say what to parse next;
turn your teams into code.

For example, a plugin for matching with a constructor looks like this:

function pluginConstructor() {
 return {
   //этот метод вызывается если в шаблоне встретилась функция
   //если бы мы ждали объект, то нужно было бы объявить метод parse_object
   //или метод parse, который вызвался бы для любого значения
   parse_function: function(part, f) {
     //добавляем команду с именем "constructor"
     //после перегруппировки нас попросят превратить команду в код - см. метод ниже
     f.addCheck("constructor", part.name);
   },
   //этот метод вызывается чтобы "отрисовать" код для команды "constructor"
   //мы возвращаем условие, оно будет завернуто в if.
   render_constructor: function(command, varname) {
       return varname + ".constructor.name === " + JSON.stringify(command.value);
   }
 };
}

This allowed me, on the one hand, to simplify the addition of new features, and on the other hand, to enable users to add their own plug-in and expand the syntax of templates. For example, you can add regular expression support so that you can write like this:

fn = Match -> [
  When @res = /(\d+):(\d+)/,            -> hours: @res[1], mins: @res[2]
  # or
  When RE(/(\d+):(\d+)/, @hours, @min), -> hours: @hours, mins: @mins
]

Comparison with other solutions

As I wrote at the very beginning, I tried to look for similar solutions, and here is what was found:

matches.js - similar functionality, close performance, but the templates are set as a string - therefore, neither highlighting nor putting common parts into variables
sparkler’s ideological successor - it seems to have similar functionality, but uses sweet.js macros for syntax, i.e. the code will have to be compiled further. In general, the problems are the same, although the templates look prettier
pun.js - the syntax is similar (only in templates instead of @a it is suggested to write $ ('a')), however there are fewer options (for example, variable length arrays are not supported) and performance is lower - apparently they do not compile.

A performance comparison with matches.js, pun and manual parsing can be found here .

Conclusion

That's all. The code itself can be viewed here . Despite the fact that the syntax is sharpened by coffeescript, the library itself is written in javascript and can be used in other languages compiled in js.

A few minuses after:

Partitioning arrays into “head” and “tail” is useful for recursive algorithms, but without optimization of tail recursion, there may be problems with performance and stack overflow on large volumes.
Solution: not yet come up
It’s impossible to use functions in templates - or rather, you can use them, but they are called only once when compiling the template.
Solution: use guards
Because of this context substitution, you cannot attach action functions to your context. On the other hand, if we are writing in a functional style, then it seems that we do not need method calls.
Solution: the old fashioned way, self = this
For the same reason, most likely it will not be possible to use arrow-functions from ecmascript 6 - they tightly bind the context so that even calls through call / apply do not affect them.
Solution: not yet come up

I hope something is useful. Thanks for attention.

Tags: