Differences between asynchronous and multithreaded architecture using the example of Node.js and PHP

Recently, there has been an increase in platforms built on an asynchronous architecture. The asynchronous model built the world's fastest web server nginx. Active server javascript in the face of Node.js is actively developing. What good is this architecture? How does it differ from the classic multi-threaded system? A great many articles were written on this topic, but they gave far from everyone a complete understanding of the subject. Often you have to observe the controversy around Node.js vs PHP + apache. Many people do not understand why some things can be done on Node.js, but not on PHP or vice versa - why quite correct PHP working code will slow down greatly in Node.js, or even hang it. In this article I would like to once again explain in detail the difference in their architecture. As examples of two systems, let's take a web server with PHP and Node.js.

Multithreaded model


This model is known to everyone. Our application creates a number of threads (pool), passing each of them a task and data for processing. Tasks are performed in parallel. If the threads do not have shared data, then we will not have the overhead of synchronization, which makes the work fast enough. After the work is completed, the thread is not killed, but lies in the pool, waiting for the next task. This removes the overhead of creating and deleting threads. It is on such a system that the web server works with PHP. Each script works in its own thread. One thread processes one request. We have a large number of threads, slow requests take the stream for a long time, and fast requests are processed almost instantly, freeing the stream for other work. This does not allow slow requests to take all the CPU time, forcing quick requests to hang. But such a system has certain limitations. A situation may arise when a large number of slow queries come to us, for example, working with a database or file system. Such requests will take away all the threads that will make it impossible to execute other requests. Even if the request needs only 1 ms to execute - it will not be processed in time. This can be solved by increasing the number of threads so that they can process a sufficiently large number of slow requests. But unfortunately, the threads are processed by the OS, and CPU time is also allocated to it. Therefore, the more threads we create, the greater the overhead for processing them and the less processor time is allocated to each thread. The situation is exacerbated by PHP itself - the blocking operations of working with the database, file system, input-output also spend CPU time, without doing any useful work at this moment. Here we will elaborate on the features of blocking operations. Imagine this situation: we have several streams. Each processes requests consisting of 1ms of processing the request itself, 2ms for accessing and receiving data from the database and 1ms for rendering the received data. In total, we spend 4ms for each request. When sending requests to the database, the thread starts to wait for a response. Until the data is returned, the flow will not perform any work. This is 2ms for idle request for 4ms! Yes, we cannot render the page without receiving data from the database. We have to wait. But at the same time, we get 50% of processor idle time! And here you can add additional OS costs for allocating processor time to each thread. And the more flows - the more these costs. As a result, we get quite a lot of downtime. This time directly depends on the duration of queries to the database and file system. The best solution that allows us to completely load the processor with useful work is the transition to an architecture that uses non-blocking operations.

Асинхронная модель


A less common model than a multi-threaded, but with no less possibilities. The asynchronous model is built on an event queue (event-loop). When an event occurs (a request came, a file was read, a response came from the database) it is placed at the end of the queue. The thread that processes this queue, takes the event from the beginning of the queue, and executes the code associated with this event. While the queue is not empty, the processor will be busy. Node.js works like this. We have a single thread that processes the event queue (with the cluster module - there will be more than one stream). Almost all operations are non-blocking. Blocking devices are also available, but their use is highly discouraged. Then you will understand why. Take the same example with the request 1 + 2 + 1ms: from the message queue, an event associated with the arrival of the request is taken. We process the request, spend 1ms. Next is doneасинхронный неблокирующийdatabase request and control is immediately passed on. We can take the next event from the queue and execute it. For example, we take another 1 request, process it, send a request to the database, return the control and do the same one more time. And here comes the database response to the very first request. The event associated with it is placed in a queue. If there was nothing in the queue - it will be executed immediately, the data will be rendered and given back to the client. If there is something in the queue, you will have to wait for the processing of other events. Usually the speed of processing a single request will be comparable to the speed of processing a multi-threaded system and blocking operations. In the worst case, waiting for the processing of other events will take time and the request will be processed more slowly. But at that moment, while the system with blocking operations would simply wait for a 2ms response, The system with non-blocking operations managed to perform 2 more parts of 2 other requests! Each request may be a little slower overall, but at a time, we can process many more requests. Overall performance will be higher. The processor will always be busy with useful work. At the same time, much less time is spent on processing the queue and transition from event to event than on switching between threads in a multi-threaded system. Therefore, asynchronous systems with non-blocking operations must have no more threads than the number of cores in the system. Node.js initially only worked in single-threaded mode, and to fully use the processor, you had to manually lift several copies of the server and distribute the load between them, for example, using nginx. Now a cluster module has appeared for working with several cores (at the time of writing this article is still experimental). This is where the key difference between the two systems becomes clear. A multi-threading system with blocking operations has a lot of downtime. An excessive number of threads can create a lot of overhead, while an insufficient amount can lead to slower work with a large number of slow queries. An asynchronous non-blocking application uses processor time more efficiently, but is more difficult to design. This especially affects memory leaks - the Node.js process can run for a very large amount of time, and if the programmer does not take care of cleaning the data after processing each request, we will get a leak, which will gradually lead to the need to restart the server. There is also an asynchronous architecture with blocking operations, but it is much less profitable, which can be seen further in some examples. Let us highlight the features that need to be considered when developing asynchronous applications and analyze some errors that people have when trying to deal with the features of the asynchronous architecture.

Do not use blocking operations. Never


Well, at least until you fully understand the architecture of Node.js and can not accurately work with blocking operations.
When switching from PHP to Node.js, some people may want to write code in the same style as before. Indeed, if we need to first read the file, and only then proceed with its processing, then why can not we write the following code:

var fs = require('fs');
var data = fs.readFileSync("img.png");
response.write(data);

This code is correct and quite working, but it uses a blocking operation. This means that until the file is read, the message queue will not be processed and Node.js will just hang without doing any work. This completely kills the main idea. While the file is being read, we could do other work. For this we use the following construction:

var fs = require('fs');
fs.readFile("img.png", function(err, data){
	response.write(data);
});

Let us examine it in more detail: we have an asynchronous read from the file, when calling the read function, control is immediately passed on, Node.js processes other requests. As soon as the file is read, the anonymous function is called, and the second parameter is passed to readFile. Or rather, the event associated with it is placed in the queue and when the queue reaches it, it is executed. Thus, we do not violate the sequence of actions: the file is first read, then processed. But at the same time, we do not occupy the processor time by waiting, but allow processing other events in the queue. This circumstance is very important to remember, since only a few carelessly inserted synchronous operations can greatly decrease performance.
Use this code, and you will hopelessly kill the event-loop:

var fs = require('fs');
var dataModified = false;
var myData;

fs.readFile("file.txt", function(err, data){
	dataModified = true;
	myData = data+" last read "+newDate();
});

while (true){
	if(dataModified)
		break;
}

response.write(myData);

Such a piece of code will take up all the CPU time itself, not allowing other events to be processed. Until the check is completed successfully, the loop will repeat, and no other code will be executed. If you need to wait for an event then ... use events!

var fs = require('fs');
var events = require('events');
var myData;
var eventEmitter = new events.EventEmitter();

fs.readFile("file.txt", function(err, data){
	myData = data+" last read "+newDate();
	eventEmitter.emit('dataModified', myData);
});

eventEmitter.on('dataModified', function(data){
	response.write(data);
});

Again, this code will be executed only after the fulfillment of a certain condition. Only this check does not run in a loop — the code that fulfills our condition, using the emit function, triggers an event for which we hang the handler. The events.EventEmitter object is responsible for creating and processing our events. eventEmitter.on is responsible for executing code when a specific event occurs.
In these examples, you can see how careless use of the blocking code stops the processing of the event queue and, accordingly, stops all Node.js operation. To prevent such situations, use asynchronous code tied to events. Use asynchronous operations instead of synchronous, use asynchronous checks for the occurrence of some event.

Do not use large cycles for data processing. Use events


What happens if we need to use a huge data cycle? What if we have to have a cycle that works throughout the life of the entire program? As we already figured out above - large cycles lead to blocking the queue. When the need for a cycle does arise, we replace it with event creation. Each iteration of the loop creates an event for the next iteration, putting it in a queue. Thus, we will miss all the events that were waiting in the queue of the hour and, after processing, we will proceed to a new iteration without blocking the queue.

functionincredibleGigantCycle(){
	cycleProcess();
	process.nextTick(incredibleGigantCycle);
}

This code will execute the body of the loop and will create an event for the next iteration. There will be no blocking of the event queue in this case.

Do not create large operations that take a lot of CPU time.


Sometimes there is a need to process a huge amount of data or the implementation of a resource-intensive algorithm (although writing evil matan on Node.js is not the best idea). Such a function can take a lot of processor time (say, 500ms) and until it is executed, many small requests will be idle in the queue. What if such a function is still there and we cannot refuse it? In this case, the output can be splitting the function into several parts, which will be called in turn as events. These events will go to the end of the queue, while the events may first pass without waiting for our weighty algorithm to complete completely. Your code should not have large consecutive pieces, not broken up into separate events. Of course, there is still a way out in the form of creating your own module on C, but this is from another opera,

Watch what type of function you are using.


Read the documentation to understand whether you are using a synchronous or asynchronous, blocking or non-blocking function. In Node.js, it is customary to call synchronous functions with the postfix Sync. In asynchronous functions, the event handler upon completion of a function is usually passed as the last parameter and is called callback. If you use the asynchronous function where you want to use the synchronous function, you may encounter errors.

var fs = require('fs');
fs.readFile("img.png", function(err, data){

});
response.write(data);

Let's sort this code. A non-blocking read of the file begins in an asynchronous manner. Control is immediately transferred further - a response is recorded to the user. But at the same time the file has not yet had time to be considered. Accordingly, we give an empty answer. Do not forget that when working with asynchronous functions, the code for processing the result of the function must always be located inside the callback function. Otherwise, the result of work is unpredictable.

Understand the benefits of asynchronous requests


Sometimes there are questions, why do you have to write “spaghetti code” on Node.js, constantly putting callbacks in each other, when everything is clearly consistent in PHP? After all, the algorithm here and there is the same.
Let's sort the following code:

 $user->getCountry()->getCurrency()->getCode()

and

user.getCountry(function(country){
	country.getCurrency(function(currency){
		console.log(currency.getCode())
	})
})


Both there and there processing will go only after completion of all 3 requests. But there is a significant difference: in PHP, our database requests will be blocking. The first request is executed first, there is some processor idle time. Then the second request with idle time, similar to the third. With an asynchronous non-blocking architecture, we send the first request, we start the execution of any other operations related to other events. When a request from the database is returned, we process it, generate a second one, send it, continue processing other events. In the end, and there and there we get 3 consecutive requests. But in the case of PHP, we will have some simple processor, while Node.js will execute some more useful code, and may even have time to process several requests that do not require access to the database.

Conclusion


Such features of Node.js need to be known and understood, otherwise when switching to it from PHP you may not only not improve the performance of your project, but also significantly worsen it. Node.js is not just another language and another platform, it is a different type of architecture. If you observe all the features of an asynchronous architecture, you will get benefits from Node.js. If you persist in writing your programs the way you would write them in PHP, do not expect anything but frustration from Node.js.

Also popular now: