Node: Scaling in small versus scaling in general

Published on September 04, 2011

Node: Scaling in small versus scaling in general

Original author: Alex Payne
  • Transfer
Over the past few weeks, I have been using all the free time I can find to think about what technologies we will use to implement the first version of BankSimple . Many people will probably assume that I immediately preferred Scala, due to the fact that I was a co-author of a book about this language, but I approach the solution of engineering problems in a completely different way. Each problem has an appropriate set of applied technologies, and the task of the developer is to justify the need for their use.

(By the way, Scala may be well suited for BankSimple, to a large extent due to the large amount of third-party Java code with which we must integrate, but this is a completely different topic for the blog, and even, most likely, for a completely different blog).

One of the most talked about technologies among Hacker News is Node, an environment for developing and running event-driven JavaScript applications on a V8 virtual machine . As part of the selection of technologies for the project, I performed a Node assessment. Yesterday I expressed some general skepticism regarding Node, and the author of this environment, Ryan Dahl, asked me to express my thoughts in more detail. So, I’m proceeding.

Of course, I do not have the goal of discrediting Ryan, a good guy and an excellent programmer who knows more about low-level C than most of us will ever be able to, and without show-offs (in the original, without neckbeard ). And I am not discussing here a community of enthusiasts that has grown rapidly around Node; If you find a tool with which you love to work, and strive for growth with it, then this gives you more strength.

Rather, the purpose of the article is to study how much Node satisfies the second of the tasks set for the Node project, a task that seems important to me for several applications.

What is Node created for?


The About section of the Node homepage reads:
“Node’s goal is to provide an easy way to build scalable network applications.”

A few paragraphs below state:

“Since nothing is blocked, not even programming experts can create fast systems [with Node].”

So, is Node's goal to provide an easy way to create scalable network programs, or to allow non-expert programmers to develop “fast systems”?

Although these goals may seem related, they are very different in practice. In order to better understand why, we must distinguish between what I call “scaling in small” and “scaling in general”.

Small Scaling


In a system of small scale, in general, everything works.

The capabilities of modern equipment are such that, for example, you can create a web application that supports thousands of users using one of the slowest programming languages ​​available, terribly inefficient access to the data warehouse and inefficient data storage templates, absolutely without caching, without a robust distribution of work , without regard to the context of use, etc. etc ... In principle, you can apply all available anti-patterns and still get a workable system as a result, simply because the equipment can work effectively even with a poor choice of solutions.

This is wonderful, actually. This means that we can prototype thoughtlessly, using any technology that we like, and these prototypes will often work better than we expected. Better yet, when we get stuck in traffic, it’s trivial to go around it. Moving forward simply means that you need to spend several minutes thinking about your problem and choose implementation technologies with slightly higher performance characteristics than those that you used before.

Here, I think Node fits perfectly.

If you look at people who use Node, they are largely web developers who work in dynamic languages ​​with what we might politely call limited performance characteristics.. Adding Node to their architectures means that these developers came from the fact that they had no concurrency and had very limited runtime application performance in order to move to relatively good concurrency - a hard-coded Node environment running on a virtual machine with relatively good performance. These developers removed the painful part of their application, which was more suitable for asynchronous implementation, rewrote it with Node, and move on.

It is wonderful. Such a result definitely matches Node's stated secondary goal, “less than an expert programmer” “able to develop a fast system”. However, it has very little to do with scaling in general, in a broader sense of the term.

Scaling in general


In a system of significant proportions, we do not have a magic bullet.

If your system is faced with a waterfall of work that needs to be done, none of the technologies can do it all better. When you work on a large scale, you move along the razor's edge, forming a coordinated dance of well-applied technologies, development methods, statistical analysis, intra-organizational communications, intelligent engineering management, fast and reliable operation of equipment and software, vigilant monitoring, and so on. Scaling is difficult . It’s so difficult that, in fact, the ability to scale is a deep competitive advantage of those that you can’t just download, copy, buy or steal just by going outside.

This is my criticism of Node’s main stated goal: "to provide an easy way to create scalable network programs." I basically don’t believe that there is a simple way to create scalable anything . People confuse light problems with simple solutions.

If you have a problem that was easy and convenient to solve by moving the code from one part of an extremely limiting technology to the edge of a slightly less limited technology, consider yourself lucky, but that does not mean you work on a scale. Twitter won an easy victory when part of the service, such as Ruby's self-written message queue , was rewrittenon Scala. It was great, but it was scaling in small. Twitter is still in a tough battle to scale as a whole, as it means much, much more than choosing any technology.

Growth Node


As for me, I think that Node will be hard to grow with the developers in the process of moving from scaling in small to scaling in general (no, I do not argue that “callbacks will turn into a bunch of spaghetti code”, though, I I think you hear about it again and again, because this is actually a painful point for developers of asynchronous systems).

A bold decision in Node architecture is that all operationsasynchronous, down to file I / O, and I admire Ryan's commitment to consistency and clarity in implementing this thesis in his software. Engineers who deeply understand the workload of their systems can find places where the Node model is well suited and can be good and effective for an indefinite time; we don’t know this, because we have not yet seen long-term and mature deployments of Node. Most of the systems I have been working with are changing all the time. The workload is changing. The data you work with changes with the system. What used to fit well as an asynchronous solution suddenly became better served by a multi-threaded solution, or vice versa, or you came across some other, unpredictable, complete changes.

If you are deeply immersed in Node, you are stuck on one of the ways to achieve parallelism, on one way to model your problems and solutions. If the solution does not fit into the basis of the event model, you are hit. On the other hand, if you are working with a system that allows you to implement several different parallelization approaches ( JVM , CLR , C, C ++, GHC , etc.), you have the opportunity to change your parallelism model as your system evolves .

At the moment, Node’s main premise - that events necessarily mean high performance - is still in question. Researchers at the University of California at Berkeley found that“Execution threads can have the strengths of an event model, including support for high concurrency, low overhead, and a simple concurrency model.” A later study based on previous work shows that events and the pipeline model approach are equally good, and that blocking sockets can actually increase their performance. In the industrial world of Java, it is periodically suggested that non-blocking I / O is not necessarily better than threading . Even one of the most cited documents on this subject with the blatant headline “Why Streams Are a Bad Idea”ends with the conclusion that you should not give up threads for high-performance servers. It just pointed out that there is no solution that equally suits everyone in terms of parallelism.

In fact, adopting a hybrid approach to concurrency seems to be a move forward if there are no contraindications. University of Pennsylvania computer scientists have discovered that a combination of streams and events offers the best of both worlds . The Scala EPFL team claims that Actors combines thread-based programming and event-based programming into one neat, easy-to-understand abstraction. Russ Cox , former Bell Labs employee, now engaged in the projectThe Go programming language in Google goes even further, arguing that the discussion of “threads versus events” is meaningless (note that all this does not even affect the distribution aspect of system scaling; threads are constructs for one computer, and events are constructs for one processor; we don’t even talk about the distribution of work between machines in a simple manner; by the way, this is included in Erlang, and you should think about it if you are nursing a fast-growing system).

Statement: Experienced developers use a mixture of threads and events, as well as alternative approaches such as Actors and, experimentally, STM. For them, the idea that “non-blocking means that it is fast” sounds, at least, a little silly, this refers to the mythology of scalability. Guys who pay a lot of money to deliver scalable solutions do not frantically rewrite their systems using Node at night. They do what they always did: measurement, testing, performance testing, pondering, studying the scientific literature related to their problems. This is what you need to scale in general.

Conclusion


For my working time investments, I would rather be based on a system that allows me to flexibly mix an asynchronous approach with other parallelism modeling methods. The hybrid concurrency model may not be as simple and clean as the Node approach, but it will be more flexible. Although BankSimple is in its infancy, we will face the joyful challenges of scaling in the small, and Node may be the smart choice for us at this early stage. But when we need to scale as a whole, I would prefer an assortment of different options that are open to me, and I would not want to face the prospect of a great rewrite under pressure from circumstances.

Node is a great piece of code with a community of enthusiasts, an accompanying whip, and a bright future. As a “unifying technology” that offers an immediate solution to the problem of early scaling in a way that is especially accessible to a generation of web developers who come largely from users of dynamic languages, this makes sense. Node more than it seems meets its secondary stated goal, attracting acceptable productivity from developers with little experience who need to solve network-oriented tasks. Node is very convenient and enjoyable for a certain type of programmer, and it is undeniable that it is easy to start working with it. People from the Node community are in a good position to invent wheels inspired by other well-known web carcasses, package managers, testing libraries, etc., and therefore I do not regret them. Each community of programmers rethinks early things, leading to its own standards.

After we figured out why Node is more suitable, and for which less, it is important to remember that there is no panacea for tasks of significant magnitude. Node and its approach with strictly asynchronous events should be seen as a very early point on the continuum of technologies and techniques that involves scaling in general.

Be careful with popular solutions. Everyone can talk about a hot new technology, but very few people actually work on the scale to which these technologies will be used, passing through various rakes. Those who are usually short with numbers and scientific research are busy working with tools and methods that have been good for a long time. If you invest your time in new technologies, be prepared to learn and grow with them, and, perhaps, to deserting from the ship when you find yourself limited.

It is not easy.