ilya42 September 29, 2013 at 18:00

We load Node to the eyeballs (2 of 12 articles on Node.js from the Mozilla Identity team)

Transfer
Tutorial

From a translator: This is the second article in the Node.js series from the Mozilla Identity team, which is involved in the Persona project . This article is based on a speech by Lloyd Hiliel at Node Philly 2012 in Philadelphia.

All articles of the cycle:
" Hunting for memory leaks in Node.js "
"We load Node to the eyeballs "
" We store sessions on the client to simplify application scaling "
" Front End Performance. Part 1 - Concatenation, Compression, Caching "
" We are writing a server that does not crash under load "
" Front End Performance. Part 2 - Caching Dynamic Content Using etagify "
" Taming web application configurations using node-convict "
" Front End Performance. Part 3 - Font Optimization "
" Localizing Node.js. Applications Part 1 "
" Localizing Node.js. Applications Part 2: Toolkit and Process "
" Localizing Node.js. Applications Part 3: Localization in Action "
" Awsbox - PaaS Infrastructure for Deploying Node.js Applications on Amazon Cloud "

The Node.js process runs on a single processor core, so building a scalable server on Node requires special care. Due to the ability to write native extensions and a well-thought-out set of APIs for process control, there are several different ways to get Node to run code in parallel. We will consider them in this article.

In addition, we will introduce the compute-cluster module, a small library that facilitates the management of a collection of processes for performing distributed computing.

Formulation of the problem

For Persona, we needed to create a server that could handle the processing of multiple requests with mixed characteristics. We chose Node.js. for this purpose. We had to process two main types of requests: “interactive”, which did not require complex calculations and had to be executed quickly so that the application interface was responsive, and “batch”, which took about half a second of processor time and could be delayed for a while without damage for user convenience.

In search of the best application architecture, we have long and carefully considered ways to handle these types of requests, taking into account usability and cost of scaling, and in the end we formulated four basic requirements:

Saturation . Our solution was to use all available processor cores.
Responsiveness . The user interface should remain responsive. Is always.
Fault tolerance . When the load goes off scale, we should normally serve as many customers as we can, and show the error message to the rest.
Simplicity . The solution should be easily and gradually integrated into an already running server.

Armed with these requirements, we can meaningfully compare different approaches.

Approach number 1. We just do everything in the main thread

When heavy calculations are done in the main thread, the result is terrible. There is no saturation - only one core is loaded, no responsiveness, no fault tolerance - while the calculations are in progress, the application does not respond to any requests. The only advantage of this approach is simplicity.

function myRequestHandler(request, response) {
  // Подвесим всё приложение на секунду-другую.
  var results = doComputationWorkSync(request.somesuch);
}

Synchronous computing in a Node.js application that needs to process more than one request at a time is a bad idea.

Approach number 2. We do everything asynchronously

Asynchronous functions that run in the background will solve our problems, right?

Well, it depends on what it really means in the background. If the function performing the calculations is implemented in such a way that it actually works in the main thread, then the performance will be no better than with the synchronous approach. Take a look:

function doComputationWork(input, callback) {
  // Так как внутренняя реализация этой асинхронной
  // функции на самом деле работает синхронно, в основном потоке,
  // вы всё равно заблокируете процесс целиком.
  var output = doComputationWorkSync(input);
  process.nextTick(function() {
    callback(null, output);
  });
}
function myRequestHandler(request, response) {
  // Несмотря на то, что этот код *выглядит* лучше,
  // мы всё равно подвесим всё приложение.
  doComputationWork(request.somesuch, function(err, results) {
    // ... сделать что-то с резульатом ...
  });
}

The mere use of asynchronous APIs in Node does not guarantee that you will receive an application that runs on multiple cores.

Approach number 3. Doing everything asynchronously with multithreaded libraries

Having a library that is correctly written using native code, it is quite possible to use several threads from an application on Node.js. There are many such libraries, for example, node.bcrypt.js , written by Nick Campbell.

On a machine with four cores, the result looks great. Productivity is quadrupled, utilizing all available resources. However, if you run the application on a server with 24 cores, the picture is no longer so magical - the same four cores work, and the rest are idle.

The problem is that this library uses the internal pool of Node.js threads, which is not intended for this purpose at all, and is strictly limited to only 4 threads.

And this is not the only problem:

Filling the system pool of Node threads with computational tasks can slow down file or network operations, thereby degrading responsiveness.
There is no way to control the task queue. If the server is already loaded 5 minutes ahead of time, would you like to load it even more?

Libraries that use such multithreading cannot saturate many cores, badly affect responsiveness, and limit the ability of the application to respond correctly to overload, that is, damage fault tolerance.

Approach number 4. We use the built-in clustering

Node.js version 0.6.x and higher has a built-in clustering module that allows you to create several processes listening on the same socket in order to balance the load. What if you combine this feature with one of the previous approaches?

Such an architecture will inherit the shortcomings of previous approaches; in the same way, we will not be able to provide responsiveness and fault tolerance.

Just running a few additional instances of the application is not always the right option.

Approach No. 5. Introducing compute-cluster

For Persona, we solved the problem of parallelizing computing by creating a cluster of processes specifically designed for computing work. The result is a compute-cluster library .

compute-cluster spawns and manages processes, providing you with a convenient means of distributing work across child processes. Here's how to use it:

const computecluster = require('compute-cluster');
// создаём вычислительный кластер
var cc = new computecluster({ module: './worker.js' });
// запускаем параллельные вычисления
cc.enqueue({ input: "foo" }, function (error, result) {
  console.log("foo done", result);
});
cc.enqueue({ input: "bar" }, function (error, result) {
  console.log("bar done", result);
});

The file worker.jsmust contain an event handler messageto receive input.

process.on('message', function(m) {
  var output;
  // здесь делаем все тяжёлые вычисления, не заботясь о том, что можем заблокировать
  // основной поток, ведь это процесс специально предназначен для выполнения одной большой задачи
  var output = doComputationWorkSync(m.input);
  process.send(output);
});

compute-cluster can be integrated into existing asynchronous APIs without rewriting the calling code and run really fast parallel computations with minimal changes to the program.

How does this approach fit our four requirements?

Saturation : Many workflows use all available cores.

Responsiveness : Since the control process does nothing but create child processes and send messages to them, it can handle most of the time processing interactive requests. Even if the machine is 100% loaded, in the task scheduler of the operating system level, you can set the control process a higher priority.

Simplicity: This solution is easy to integrate into an existing project. Hiding the details behind a simple asynchronous API, compute-cluster leaves the calling process happily unaware of the implementation details.

What about fault tolerance during sudden traffic rush? After all, our goal is to work as efficiently as possible and at the same time be able to serve the maximum number of customers.

compute-cluster can do more than create processes and send messages. It keeps track of how many tasks are already being performed, and how much time, on average, a single task takes. Thanks to this information, one can reliably predict how long it will take to complete a request before it is queued.

Parametermax_request_timeAllows you to set the maximum acceptable time to complete the request. Attempting to queue a request will result in an error if the expected time to complete it exceeds the maximum allowed.

For example, a requirement of the form “the user should not wait for the authorization to complete for more than 10 seconds” can be set by setting it max_request_timeto 7 seconds (we will leave a reserve of 3 seconds for possible network delays).

Load testing of compute-cluster has shown promising results. Even under extreme load, authorized users could continue to use the system, and some of those who tried to enter the overloaded server immediately received an error message.

What's next?

Parallelization at the application level using processes works well only in a single-layer architecture, when there is only one type of nodes, and scaling consists in simply increasing their number. But as the application gets more complex, the architecture evolves towards highlighting multiple layers for performance or security reasons.

In addition to layering, highly loaded applications often require placement in several geographically distant data centers. And finally, application scaling can be done by adding cloud resources on demand. Multilayer architecture, geographic diversity and dynamically connected cloud resources significantly change the parameters of the scaling task, while the goal remains unchanged.

Possible directions for the development of the compute-cluster may include the distribution of tasks across different layers of a complex application, and coordination between different data centers to handle local load peaks, and the ability to use cloud resources on demand.

If you have any ideas or suggestions for improving compute-cluster, I will be glad to hear them. Join the Persona discussion on our mailing list . Thank you for reading!

All articles of the cycle:

Tags: