amarao September 19, 2013 at 15:30

Haskell Product: Project Manager Report

For a long time I promised to write an article about how Haskell showed itself in real tasks in the product.

For those who did not keep track - at the beginning of 2012 it was lobbied and programmers in Selectel started enthusiastically introducing it. Then I promised to publish a report on how “this is all” can be used.

A product in a commercial project is not a small sandbox "for yourself", not an academic experiment in Computer Science. This is an endless struggle for the “party line" when there is hell, horror and death, but it should work anyway. Int64 in XML-RPC is encoded with a string (because ints in XML-RPC are signed int32), openssl when reading several certificates from a file reads only the first of them, in bool you need to write either “1” or “0”, but sometimes - “2”, because it was only in this way that they came up with the third mode - and so on. etc. In these conditions, the requirements for the language gradually develop into requirements for its ecosystem, infrastructure, readiness to adapt to the real world.

I will write about Haskell from the perspective of a product owner, a project manager, a system administrator, but not a programmer. So do not expect from me sincere enthusiasm about how gracefully you can make a semigroupoid through monads and how cool it is to derive types through types using types.

From the point of view of the project manager, the programming language is evaluated using several metrics that are completely different from the programming ones. For a programmer, language and its features are perhaps the most important thing, since it is with him that he spends most of his time. For the rest of the team, what’s going on outside the source is much more important. First, it is a search for libraries and suitable technologies, then the tasks of maintenance, monitoring, implementation and debugging.

Let's start with consumer properties.

Program execution speed

Haskell programs are faster than python, php, ruby (and other interpreted languages). Faster than Erlang / Java (and other vm-based languages). It is usually slower than C, although I have seen several cases where the Haskell compiler produced results that are superior to those of the C compiler.

For any practical application of Haskell performance - behind the eyes and behind the ears.

The main advantage compared to python (from which we gradually migrated) is its excellent parallelism of execution. No GILs, no "external balancers between workers", no hell with debugging hevent.

Haskell has full-time greenlets and native use of operating system threads.

Executable file size

Most often, it doesn’t care, but in our configuration in some places it was crowded - and the minimum size of the executable in 22MB was annoying. When the "cramped places" were resolved, the size ceased to play any tangible role. Our largest server occupies 44MB and dynamically links to three dozen so's.

Memory usage

(In this section, we are talking about 'resources', that is, the memory in which data is stored, not the code, in the top it corresponds to the RES column).

In computer algorithms, the used memory is usually calculated in O-notation, but there is an important factor - if there are many processes, and each of them is O (1), then how much memory will be eaten on the server? Those same "plebeian constants", suddenly begin to play a role.

Haskell uses memory comparable to python programs. Demons (the part of them that does not store a significant amount of data) occupy from 9 to 20 megabytes. Python demons are about the same.

I must say that in this parameter Haskell is slightly inferior to OCaml (for that, combat services can live with 1-2 megabytes of memory), and, of course, C (for example, modd eats only 0.15 MB), but much better than the situation with Java / Erlang .

Real executables

Most cozy software environments (jvm, python, beam.smp, php, perl, .net, etc) require quite a bit of infrastructure (running an interpreter / virtual machine, a bunch of files in the right places, etc). When you write a program that "receives two numbers from the user, writes them to the database and shows the project administrator their amount", everything is ok.

But sometimes it turns out that you need to write a program that runs in single mode. Or instead of init. Or from init itself. Or with suid. Or in some other way, so that there is no place to deploy a cozy runtime environment.

Haskell generates an executable file. Real ELF. Which can be static or dynamic. And that’s great.

The second important factor: startup speed. In many cases, the program starts and ends. In python (and many other interpreted languages), 100500 different files are scanned at startup, especially with a bunch of imports, which leads to delays of 100-200 ms at the start. At Haskell, this value is much smaller, because ld works many times faster than Python or PHP.

The same goes for ps / top output - Haskell programs are regular executable files that look like “just processes” in the process list, and not like python running files.

This one has a minus: 32/64 bits, all of a sudden, are different executable files, and libffi5 or libffi6 is already a big difference that interferes with the cross-compatibility of applications for a particular distribution, or even different versions of the same distribution kit.

Monitoring

Since the Haskell program is “native” to the operating system, there are no special monitoring features (for comparison, the Java machine has its own indicators, which we need to monitor, Erlang has its own).

Code quality

When using an already written program, there is exactly one thing that interests you: how often it crashes, beeps and spoils everything. So, in comparison with python - incomparably less often. Yes, with proper file handling, you can catch an exception that has leaked into toplevel, but the probability of this is extremely small (I saw it once during all the time of use among all programs).

The likelihood of stupid mistakes is much less. And when I say “significantly”, this is not in theory, but in practice, that is, by observing the same product, written mostly by the same people in Haskell and Python.

Python, like any other dynamically typed language, is a continuous time bomb. All bad situations must be thought out explicitly, plus no one insures against minor local errors or negligence. Errors either appear in runtime, or they can either be hidden in implicit except: pass (which is even worse). Object of type 'NoneType' has no method - that's all. And if this error happens to be in a rare branch, then the mine turns out to be completely slow and fires when the code has long been “stable and showing itself well”, and in general, 300 days of uptime.

Tests that “cover the entire code”, unfortunately, do not cover “all possible types of input data” (which, suddenly, are dynamic) and do not save typing errors at all.

On Haskell, such errors, errors of the level “oh, I forgot to check in this branch” or “mixed up the returned type” do not appear in programs. Programmers argue this with a convenient type system, which allows you to catch most of these errors at the compilation stage, plus a language that allows you to write the main thing, without being distracted by the index of arrays and temporary variables. They know better.

From the experience of analyzing the errors found, I can say that most of the errors that we encountered are either an error in the TOR (that is, an error of your humble servant) or a mistake that was not correctly understood by the programmers. But not a local mistake or forgetfulness.

This leads to a paradoxical conclusion: it is more difficult to fix bugs in a program in Haskell than in languages with dynamic typing, because in a language with dynamic typing, the next place where NoneType suddenly crawled out was corrected as well, and on Haskell, you have to deal with the algorithm yes by to the lack of clarity of TK with other people to swear.

Library Maturity

Spherical in vacuum programs work only in spherical conditions and in vacuum. Everyone else works in the real world, where there are millions of complex formats, specifications, and protocols. That is, libraries are needed. Thousands of them.

And, surprisingly, there are a lot of them on Haskell. That is, from the point of view of “they took and began to write battle code” - yes, because you will not need to invent logging, ssl, ready-made orm, regexp, support for localization, time, http-server, etc. Almost everything is ready. Although there were unpleasant moments. For example, we had to independently support the implementation of bson / mongdb for Haskell, since the venerable Tengen stopped supporting him.

... At the same time, the Haskell program is also not protected from sefaults, because most programs are linked to libraries written in C, and this is either a library error, or the programmer who caused this library to be called wrong (and the compiler itself from this no longer protects). In a couple of places, this led to a rewriting of the library in pure Haskell (for example, for this reason we have Hen written that implements the subset of requests we need to work with Xen we need, commits for full support are welcome).

Compilation speed

I never thought that this could be a problem, but the fact: half an hour to build the project. On a very old-fashioned hardware with a bunch of cores and ultrafast storage below. Personally, after the first moments of pride, “wow, our program has already compiled for half an hour”, it started to annoy, because a small bugfix, and hello, the scene:

Difficulty tracking

Maintenance is the introduction of relevant minor changes, a local clarification of “what’s wrong,” in short, the routine of a living “own” project.

So, with the accompaniment it turned out pretty unpleasantly. It is clear that the system administrator can also be taught monadic computing. But ... Well, you get the point. If the sysadmin could find and correct in the Python code, if needed, now the code is a shaitan-arba, to which special people are attached to change it, and you have to talk about the problem only in a diagnostic style (“it doesn’t work here”, “it doesn’t "). Firstly, it spoils the synergy of devops somewhat (if someone from the team does not understand the essence of what the second part of the team has done, this is bad), and secondly, the requirements for people who sit down at the code are very high.

Searching for programmers is a problem, as if Haskell’s apologists did not say otherwise. Even the option to “retrain” is a problem, because under functional programming, especially with “command” and template haskell, you need to turn your brains a lot. In other words, language is some additional obstacle that raises the threshold of entry.

In the context of a known shortage of programmers in the labor market, this is an obstacle. On the other hand, the presence of such things attracts programmers who are tired of “php + js, from here until noon.”

Development speed

Surely I will hear a lot of indignation from the apologists of the language, including the programmers with whom I worked. But, objective reality: projects on Haskell are written slower than on Python. The counterargument will lead me to a changed programming style, more attention to details, etc., but all the same, my current conviction, based on practice - the final speed of implementing new functionality on Haskell is noticeably lower. Alas.

This is partially compensated by the time for post-debugging and catching all sorts of stupid bugs that made up a decent loop in Python after writing the program, and which is almost absent with Haskell, but even with this in mind, it still turns out slower.

A similar problem with prototyping. If the basic prototype in python appears almost copy-paste what it did in the laboratory’s interactive environment, but in Haskell it’s usually some kind of sacred action that goes away for a while (types, etc.), and only after a while leads to the result. If it turns out that the result is “not quite what we dreamed about,” then it becomes clear already closer to the final, and not at the beginning. Thus, the cost of iteration in the search for a solution increases, making the whole process less flexible.

Tags: