Stop being cute and smart

From the sandbox

This text is a translation of the article 'Stop Being Cute and Clever' by the notorious (at least in the Python community) Armin Ronacher.

The last days in my free time I was creating a scheduler . The idea was simple: create a clone of worldtime buddy using AngularJS and some other JavaScript libraries.

And you know what? It was by no means fun. I haven’t been so angry for a long time while working on something, which means something, because usually I quickly express my discontent (I apologize to my followers on Twitter).

I use JavaScript regularly, but I rarely had to deal with other people's code. Usually I am attached only to jQuery, underscore and sometimes AngularJS. However, this time I went all-in and decided to use various third-party libraries.

For this project, I used jQuery, which you can’t do without (and why?), And AngularJS with some UI components (angular-ui and jQuery UI bindings). To work with time zones, moment.js was used.

I want to note right away that I'm not going to criticize anyone's specific code. Moreover, if someone looks into my JavaScript sources, their code will be slightly better, and sometimes worse, because I did not spend much time on it, and indeed I don’t have much experience working with this language.

However, I noticed an alarming tendency for awful quality code to appear in JavaScript libraries (at least in the ones I use), and wondered why this was happening.

I had a lot of problems with js libraries, and all of them were the result of the fact that everyone did not seem to give a damn about the features of the language.

The reason I began to actively study third-party JavaScript code was because my naive attempt to send 3mb city names to the typeahead.js autocompletion library led to an incredibly slow UI. Obviously, now no smart person will send so much data in the field with auto-completion at once, but will first filter them on the server side. But this problem does not lie in slow data loading, but in slow filtering. What I could not understand, because even if there is a linear search of 26,000 elements, it should not be so slow.

Background

So, the interface was slow - obviously, the error was in my attempt to transfer too much data. But it’s interesting that performance dropped when using a typeahead widget. And sometimes in a very peculiar way. To show how crazy it was, I’ll give you some initial tests:

We are looking for San Francisco by typing “san”. ~ 200ms.
We are looking for San Francisco by typing “fran”. ~ 200ms.
We are looking for San Francisco by typing “san fran”. Second.
We are looking for San Francisco, again typing "san". Second.

What is going on? How does a search break down if we search for something more than once?

The first thing I did was use the new Firefox profiler to see what it takes so much time. And very quickly he found in typeahead a bunch of things that were too weird.

The bottleneck was found pretty quickly. The problem was an epic miss when choosing a data structure and a strange algorithm. The way to find matches is fancy and includes such wonderful things as going through the list of lines and then checking each of them for inclusion in other lists, including the original one. When there are 6,000 items in the first list, and for each, a linear search is launched to check whether this item is actually in the list, all this takes a very long time.

Yes, errors do happen, and if you run tests with small amounts of data, you won’t even notice them. Thousands of cities and time zones I sent were too many. Also, not everyone writes search functions every day, so I don’t blame anyone.

But due to the fact that I had to debug this piece, I came across the strangest code of all that I had seen before. After further research, it turned out that the same eccentricities are found not only in typeahead.

Based on this, I am now convinced that JS is a kind of Wild West software development. First of all, because it competes with the PHP code of the year 2003 in terms of quality, but apparently it worries fewer people, since it works on the client side, and not on the server. You do not have to pay for slow-running JavaScript.

Smart code

The first pain point is people to whom JS seems a cute and 'smart' language. And that makes me ridiculously paranoid when conducting code reviews and finding bugs. Even if you know the idioms used, you cannot be sure if the side effects are intentional or if someone just made a mistake.

For an example I will give a piece of typeahead.js:

_transformDatum: function(datum) {
    var value = utils.isString(datum) ? datum : datum[this.valueKey],
        tokens = datum.tokens || utils.tokenizeText(value), 
        item = {
            value: value,
            tokens: tokens
        };
    if (utils.isString(datum)) {
        item.datum = {};
        item.datum[this.valueKey] = datum;
    } else {
        item.datum = datum;
    }
    item.tokens = utils.filter(item.tokens, function(token) {
        return !utils.isBlankString(token);
    });
    item.tokens = utils.map(item.tokens, function(token) {
        return token.toLowerCase();
    });
    return item;
}

This is just one feature that nonetheless hooked me for many reasons. All that the function does is convert an object with data into a list item. What is an object with data? Somewhere here the fun begins. It seems that the author of the library at some point revised his approach. Everything should have started with the function taking a string and then wrapping it with an object with a value attribute (also a string) and an array of tokens. However, now the returned object is a wrapper over the data object (or string) with a completely different interface. A bunch of data is copied, and then some attributes are simply renamed.

Suppose that an object of the following form arrives at the input:

{
    "value": "San Francisco",
    "tokens": ["san", "francisco"],
    "extra": {}
}

Then it transforms into this:

{
    "value": "San Francisco",
    "tokens": ["san", "francisco"],
    "datum": {
        "value": "San Francisco",
        "tokens": ["san", "francisco"],
        "extra": {}
    }
}

I can understand why the code ends this way, but looking at a completely different section of code, it is completely unclear why my datum object has become another object, nevertheless containing the same data. Even worse: the memory used by the object is doubled, because tokens are copied during operations with arrays. It turns out that I could just send data objects in the correct format, while reducing memory consumption by 10MB.

But such code is quite typical for JavaScript, and it is frustrating. It is unclear, it is strange, it lacks information about types. And he is too smart.

It just operates on objects. You cannot ask the object: datum, are you in the right format? It is just an object. When digging through the implementation details, it turned out that you could send a whole bunch of different types of data to the input - and everything would continue to work, just doing something else at the beginning and breaking much later. Impressive is the amount of incorrect information that JS can process, giving in some way the result.

Not only is there a lack of typing, but this code is still frivolously abusing operators and functional programming. I can’t convey in words how incredulously I feel about such a style of writing JS code, given how strange the function works map. Not many languages manage to be implemented mapin such a way that it ["1", "2", "3"].map(parseInt)results in [1, NaN, NaN].

Operator abuse is widespread. A little below you can see a wonderful piece of code:

_processData: function(data) {
    var that = this, itemHash = {}, adjacencyList = {};
    utils.each(data, function(i, datum) {
        var item = that._transformDatum(datum), id = utils.getUniqueId(item.value);
        itemHash[id] = item;
        utils.each(item.tokens, function(i, token) {
            var character = token.charAt(0), adjacency =
                adjacencyList[character] || (adjacencyList[character] = [ id ]);
            !~utils.indexOf(adjacency, id) && adjacency.push(id);
        });
    });
    return {
        itemHash: itemHash,
        adjacencyList: adjacencyList
    };
}

For information: utils.indexOf - a simple linear search in the array, but utils.getUniqueIdreturns a constantly increasing integer as an identifier.

Obviously, the author of this code knew about hash tables with complexity O(1), otherwise he would not put this line in hashmap. But still, a few lines below there is a linear search before positioning an item in the list. If you throw 100,000 tokens into this code, it will work very slowly, believe me.

I also want to pay attention to this cycle:

utils.each(item.tokens, function(i, token) {
    var character = token.charAt(0), adjacency =
        adjacencyList[character] || (adjacencyList[character] = [ id ]);
    !~utils.indexOf(adjacency, id) && adjacency.push(id);
});

I'm just sure that the author was very proud. For starters, why so? Is it !~utils.indexOf(...) &&really a worthy replacement if (utils.indexOf(...) >= 0)? Not to mention the fact that hashmap with adjacency lists is called adjacencyList... Or the fact that the list is initialized with a line ID, and then it immediately passes a linear search through the entire list to find the same element again. Or that the value in the hash table is entered by boolean checking the list and using the 'or' operator to perform the assignment.

Another common hack is to use a unary operator +(which is useless in other languages as it is noop) to translate a string into a number. +value- the same as parseInt(value, 10).

I have a theory that all this operator madness came from Ruby. But in Ruby, this makes sense, since there are only two objects with a value 'ложь': falseand nil. Everything else - 'истина'. The whole language is based on this concept. In JS, many objects are false. And then sometimes not.

For example, an instance of an empty string is ""equal to false. Except when it is an object. And strings sometimes become objects by chance. The jQuery function eachpasses the current value of the iterator as this. But, since it thiscannot refer to primitive types, the object is passed as a string wrapped by the object.

So in some situations, the behavior can be very different:

> !'';
true
> !newString('');
false
> '' == newString('');
true

You can sympathize with operators in Ruby, but not in JavaScript. This is corny dangerous. Not that I do not trust the person who tested his code and knows what he is doing. Simply, if someone else then looks at this code, it will not be clear to him whether such a behavior is planned by the developer.

Using ~to check the indexOf value returned by the function , which can be equal -1in the absence of an element, is simply unreasonable. And please do not tell me that "just as fast."

We work "gain"

The dubious use of operators is one thing, but it really kills that the dynamic nature of JS is raised to an absolute. As for me, even Python is an overly dynamic language, but pythonists at least quite reasonably reduce the modification of classes and namespaces in runtime to a minimum. But in the JavaScript world, everything is different. And especially in the world of AngularJS.

Classes do not exist, JS uses objects that can sometimes have prototypes. Although usually everything simply puts functions in objects. And sometimes functions in functions. Strange cloning of objects is also normal, except perhaps in the case when the state of the object often changes.

It might seem that the directives in Angular are not bad, until you come across a directive that does almost what you need. In most cases, the directive is monolithic, and the only way to change it is to add another directive with a higher priority, which will patch the previous one. I would not be upset if class inheritance is a thing of the past, giving way to combination, but such a monkey patch is not my style.

The dynamic nature allows the code to quickly develop into an uncontrollable mass, where no one knows for sure what works. Not just because of the lack of classes and types. The whole environment looks like a piece glued with electrical tape, with a thick layer of lubricant in the mechanism.

For example, Angular uses a model change tracking system and DOM to automatically synchronize them. Not only is this damn slow, but people are also coming up with all sorts of workarounds to prevent firing'a handlers. Such logic quickly becomes ridiculously confusing.

Immutability

The higher the level of programming language, the more unchanging things become. But not in JavaScript. APIs are becoming increasingly clogged with stateful concepts. It may be inappropriate to complain about it in terms of performance, but it starts to get annoying very quickly. Some of the most annoying bugs in my scheduler were in the variable nature of moment objects. Instead of returning a new object, I foo.add('minutes', 1)modified the original one. No, I knew about this, everything is described in the API documentation. But, unfortunately, accidentally passed the link there, and it was changed.

True, in theory, JS should be an excellent tool for building an API that uses immutable objects, provided that they can be “frozen” at will. This is exactly what Python is missing. However, at the same time, Python provides more tools to make immutable objects more interesting. For example, support for operator overloading, and first-classes that allow you to use such objects as keys for hash tables.

"Useful magic"

I love Angular, very much. This is one of the smartest systems for designing a UI in JavaScript, but the presence of magic in it is frightening. It starts with simple things. For example, a library renames directives. If you create a directive fooBar, it will fall into the DOM as foo-bar. Why? Suppose, for the sake of uniformity with the styleDOM API, in which something similar was done earlier. But this makes the code confusing as you may not know exactly what the directive is called exactly. Also, all this completely ignores the idea of namespaces. If you have two directives with the same name in different Angular applications, they will conflict.

Dependency injection in Angular occurs by default through converting the JS function to a string and then using a regular expression to parse the arguments. If you are new to AnguarJS, it will not make any sense to you at all, and even now this idea seems bad to me. This conflicts with what people have been doing for a long time in JS: local variables are treated as anonymous. The name does not affect anything. That is what minimizers have been using for ages. However, this does not fully apply to Angular, since there is an alternative possibility of explicitly declaring dependencies.

What kind of layers?

One of the inconveniences after switching from Python to client-side JavaScript was the lack of abstractions. As an example, Angular allows you to access the parameters of the current URL in the form of a dictionary. What it does not allow is to parse an arbitrary query string. Why? Because the internal function of parsing is hidden under many layers of closures, and someone just did not think that it could be useful.

And this happens not only in Angular. JS itself does not have a function to escape HTML. But the DOM obviously needs such functionality in some cases. Because of this, some quite seriously screen HTML like this:

functionescapeHTML(string) {
    var el = document.createElement('span');
    el.appendChild(document.createTextNode(string));
    return el.innerHTML;
}

And so you can parse the URL:

functiongetQueryString(url) {
    var el = document.createElement('a');
    el.href = url;
    return el.search;
}

This is crazy, but it is everywhere.

To some extent, I can understand that developers do not want to give access to low-level functions, but this leads to the fact that people come up with various hacks just in order to duplicate the existing functionality. It is common to see half a dozen implementations of the same action in large JS applications.

“But it works”

PHP is so popular because it just works, and does not require a lot of time to learn. A whole generation of developers began to work with him. And this group of people had to discover a bunch of things based on the experience of previous years in painful ways. A certain group mentality was formed: when one person copied the code of another, he did not particularly think about how it works. I remember the time when the plugin system was crazy, and the main way to extend PHP applications were mod files . Some erring fool began all this, and everyone began to do so. I am pretty sure that this is how register_globals, the strange manual SQL escaping, and the whole concept of processing input data instead of normal escaping came about.

JS is mostly the same. The generation of developers has changed, problems have changed, but the mentality has remained. Still, the concepts seen in one library are copied to another.

Even worse: since everything works in the sandbox on users' computers, no one even thinks about security. And unlike PHP, performance does not matter, because client-side JS "scales linearly" with the growing number of users running the application.

Future?

No, I'm not quite pessimistic about JavaScript. It is definitely improving, but I believe that it will have to go through the same phase as PHP: people who came from other areas and from other programming languages were forced to work with it, and, albeit slowly, began to bring sound thoughts to the community. The time will come when the monkey prototype patching will disappear, when more stringent type systems are introduced, people will start thinking about parallelism, and a negative reaction to the crazy metaprogramming will appear.

Over the past few years, similar events have occurred in the Python community. A few years ago, metaclasses were a hot novelty, and now, when applications become more and more, many changed their minds. When Django first appeared, developers had to advocate the use of functions instead of classes, now almost no one talks about it.

I just hope the JavaScript community takes less time to tweak than its predecessors.

Ronacher armin,
12/09/2013

Tags: