Everything you wanted to know about processing requests, but were too shy to ask
What is a network service? This is a program that accepts incoming requests over the network and processes them, possibly returning answers.
There are many aspects in which network services differ from each other. In this article I will focus on how to handle incoming requests.
Choosing how to process requests has far-reaching implications. How to make a chat service that can handle 100,000 simultaneous connections? What approach to choose to extract data from a stream of weakly structured files? Wrong choice will lead to waste of time and effort.
The article discusses such approaches as a process / thread pool, event-oriented processing, half sync / half async pattern, and many others. Numerous examples are given, pros and cons of approaches, their features and areas of application are considered.
Subject ways of processing requests is not new, see, for example: one , two . However, most articles consider it only partially. This article aims to fill in the blanks and provide a consistent statement of the question.
The following approaches will be considered:
- sequential processing
- request process
- thread per request
- process / thread pool
- event-oriented processing (reactor pattern)
- half sync / half async pattern
It should be noted that the service processing requests is not necessarily a network service. This may be a service that receives new tasks from the database or task queue. This article refers to network services, but you need to understand that the approaches under consideration have a wider scope.
At the end of the article is a list with a brief description of each of the approaches.
An application consists of a single thread in a single process. All requests are processed only sequentially. There is no parallelism. If several requests come to the service at the same time, one of them is processed, the rest are in the queue.
The advantage of this approach is the ease of implementation. There are no locks and competition for resources. The obvious disadvantage is the inability to scale with a large number of clients.
An application consists of a main process that accepts incoming requests and workflows. For each new request, the main process creates a workflow that processes the request. Scaling by the number of requests is simple: each request gets its own process.
There is nothing complicated about this architecture either, but it has
- The process consumes a lot of resources.
Try creating 10,000 simultaneous connections to a PostgreSQL RDBMS and look at the result.
- Processes do not have shared memory (default). If you need access to shared data or shared cache, you will have to save spiced memory (call linux mmap, munmap) or use external storage (memcahed, redis)
These problems are not stop. Below it will be shown how they are managed in the PostgeSQL RDBMS.
Advantages of this architecture:
- The fall of one of the processes will not affect the rest. For example, the error of processing a rare case will not drop the entire application, only the processed request will suffer.
- Differentiation of access rights at the operating system level. Since the process is the essence of the OS, its standard mechanisms can be used to differentiate access rights to the OS resources.
- You can change the running process on the fly. For example, if a separate script is used to process a request, then to replace the processing algorithm, it suffices to change the script. An example will be discussed below.
- Multi-core machines are effectively used.
- PostgreSQL RDBMS creates a new process for each new connection. Shared memory is used for working with general data. The problem of high resource consumption processes in PostgreSQL can be solved in different ways. If there are few customers (a dedicated stand for analysts), then there is no such problem. If there is a single application that accesses the database, you can create a pool of database connections at the application level. If there are many applications, pgbouncer can be used.
- sshd listens to incoming requests on port 22 and forks at every connection. Each ssh connection is a fork of an sshd daemon that accepts and executes user commands sequentially. Thanks to this architecture, the resources of the OS itself are used to differentiate access rights.
- An example from own practice. There is a stream of unstructured files from which to get metadata. The main process of the service distributes files to handler processes. Each process handler is a script that takes a file path as a parameter. File processing takes place in a separate process, therefore, due to a processing error, the entire service does not drop. To update the processing algorithm, it is enough to change the processing scripts without stopping the service.
In general, it must be said that this approach has its advantages, which determine its scope, but the scalability is very limited.
Flow per request
This approach is very similar to the previous one. The difference is that instead of processes threads are used. This allows you to use shared memory out of the box. However, other advantages of the previous approach cannot be used, while resource consumption will also be high.
- Shared memory "out of the box"
- Ease of implementation
- Efficient use of multi-core CPUs
- The thread consumes a lot of resources. On unix-like operating systems, the thread consumes almost as many resources as the process
MySQL can be used as an example. But it should be noted that MySQL uses a mixed approach, so this example will be discussed in the next section.
Process / thread pool
Flows (processes) to create expensive and long. In order not to waste resources, you can use the same thread repeatedly. By limiting additionally the maximum number of threads, we get a pool of threads (processes). Now the main thread accepts incoming requests and puts them in a queue. Workflows take requests from the queue and process them. This approach can be seen as a natural scaling of the sequential processing of requests: each worker thread can process threads only sequentially, combining them into a pool allows processing requests in parallel. If each stream can handle 1000 rps, then 5 threads will handle a load close to 5000 rps (subject to minimal competition for shared resources).
The pool can be created in advance at the start of the service or be formed gradually. Using a thread pool is more common because allows you to use shared memory.
The size of the thread pool does not have to be limited. The service can use free threads from the pool, and if there are none, create a new thread. After the request has been processed, the thread joins the pool and waits for the next request. This option is a combination of a query approach and a pool of threads. Below is an example.
- use of many CPU cores
- reducing the costs of creating a thread / process
- Limited scalability by the number of simultaneous clients. Using a pool allows us to reuse the same thread multi-graphically without additional resource costs, but it does not solve the fundamental problem of the large amount of resources consumed by a thread / process. Creating a chat service that can handle 100,000 simultaneous connections using this approach will fail.
- Scalability is limited by shared resources, for example, if threads use shared memory, adjusting access to it using semaphores / mutexes. This is a limitation of all approaches that use shared resources.
- Python application launched with uWSGI and nginx. The main uWSGI process receives incoming requests from nginx and distributes them between interpreter Python processes that process requests. The application can be written on any uWSGI-compatible framework - Django, Flask, etc.
- MySQL uses a thread pool: each new connection is processed by one of the free threads from the pool. If there are no free threads, then MySQL creates a new thread. The size of the pool of free threads and the maximum number of threads (connections) are limited by the settings.
Perhaps this is one of the most common approaches to building network services, if not the most common. It allows you to scale well, reaching large rps. The main limitation of the approach is the number of simultaneously processed network connections. In fact, this approach works well only if the requests are short or there are few customers.
Event-oriented processing (reactor pattern)
Two paradigms - synchronous and asynchronous - are eternal rivals of each other. So far, it has only been about synchronous approaches, but it would be wrong to ignore the asynchronous approach. Event-oriented or reactive request processing is an approach in which each IO operation is performed asynchronously, and when the operation completes, the handler is called. As a rule, the processing of each request consists of a set of asynchronous calls followed by the execution of handlers. At any given moment, a single-threaded application executes the code of only one handler, but the execution of handlers of various requests alternates with each other, which allows us to simultaneously (pseudo-parallelly) process many parallel requests.
A full consideration of this approach is beyond the scope of this article. For more in-depth review, we can recommend Reactor (Reactor) . What is NodeJS speed secret? , Inside NGINX . Here we confine ourselves only to the consideration of the pros and cons of this approach.
- Efficient scaling for rps and number of simultaneous connections. A reactive service can simultaneously process a large number of connections (tens of thousands) if most connections wait for I / O to complete.
- The complexity of the development. Programming in the asynchronous style is more complicated than in the synchronous one. The query processing logic is more complex, and debugging is also more complicated than in synchronous code.
- Errors leading to the blocking of the entire service. If a language or runtime environment is not designed initially for asynchronous processing, then a single synchronous operation can block the entire service, negating the scaling options.
- Difficult to scale by the CPU cores. This approach assumes the presence of a single thread in a single process; therefore, it is impossible to use several CPU cores at the same time. It should be noted that there are ways to get around this limitation.
- Corollary to the preceding paragraph: this approach does not scale well for requests demanding on the CPU. The number of rps for this approach is inversely proportional to the number of CPU operations required to process each request. Requirements to the CPU queries nullify the benefits of this approach.
- Node.js uses the out-of-box reactor pattern. For details, see What is the secret of NodeJS speed?
- nginx: worker processes (worker process) nginx'a use the reactor pattern for parallel processing of requests. See Inside NGINX for more details.
- A C / C ++ program that directly uses OS facilities (epoll on linux, IOCP on windows, kqueue on FreeBSD), or using a framework (libev, libevent, libuv, etc.).
Half sync / half async
The title is taken from POSA: Patterns for Concurrent and Networked Objects . In the original, this pattern is interpreted very broadly, but for the purposes of this article I will understand this pattern somewhat already. Half sync / half async is a query processing approach that uses a lightweight control flow (green flow) for each request. A program consists of one or more operating system level threads, but the program execution system supports green threads that the OS does not see and cannot control.
A few examples to make the review more specific:
- Service in the language of Go. The Go language supports many lightweight execution threads - gorutin. The program uses one or more OS threads, but the programmer operates with gortines, which are transparently distributed among the OS threads to enable multi-core CPUs.
- Python service with gevent library. The gevent library allows a programmer to use green streams at the library level. The entire program is executed in a single OS thread.
In essence, this approach is designed to combine the high performance of the asynchronous approach with the simplicity of programming synchronous code.
When using this approach, despite the illusion of synchronism, the program will work asynchronously: the program execution system will control the event loop, and each "synchronous" operation will actually be asynchronous. When such an operation is called, the execution system will call an asynchronous operation using the OS tools and register the handler to complete the operation. When the asynchronous operation is completed, the execution system will call a previously registered handler, which will continue the execution of the program at the call point of the "synchronous" operation.
As a result, the half sync / half async approach contains both some advantages and some disadvantages of the asynchronous approach. The size of the article does not allow to consider this approach in all details. For those interested, I advise you to read the chapter of the same name in the POSA book : Patterns for Concurrent and Networked Objects .
By itself, the half sync / half async approach introduces a new green flow entity - a lightweight control flow at the level of the program or library system. What to do with green streams - the choice of the programmer. It can use a pool of green streams, it can create a new green stream for each new request. The difference compared to OS threads / processes is that green threads are much cheaper: they consume much less RAM and are created much faster. This allows you to create a huge number of green streams, for example, hundreds of thousands in the Go language. Such a huge amount makes justified the use of the "green flow on request" approach.
- It scales well by rps and number of simultaneous connections.
- The code is easier to write and debug compared to the asynchronous approach.
- Since the execution of operations is actually asynchronous, programming errors are possible when a single synchronous operation blocks the entire process. This is especially felt in languages where this approach is implemented by means of the library, for example Python.
- The opacity of the program. When using threads or OS processes, the program execution algorithm is clear: each thread / process performs operations in the sequence in which they are written in code. When using the half sync / half async approach, operations that are written in code sequentially can interchange in unpredictable ways with operations that process parallel requests.
- Unsuitability for real-time systems. Asynchronous request processing greatly complicates the provision of guarantees for the processing time of each individual request. This is a consequence of the previous paragraph.
Depending on the implementation, this approach scales well with the CPU cores (Golang) or does not scale at all (Python).
This approach as well as asynchronous allows you to handle a large number of simultaneous connections. But programming a service using this approach is easier, because The code is written in the synchronous style.
As the name suggests, in this approach, requests are processed down the pipeline. The processing process consists of several threads of the OS, arranged in a chain. Each thread is a link in a chain; it performs a certain subset of the operations necessary to process a request. Each request sequentially passes through all the links in the chain, and different links process different requests at each moment in time.
- This approach scales well with rps. The more links in the chain, the more requests are processed per second.
- Using multiple threads allows you to scale well across the CPU cores.
- Not for all categories of requests fit this approach. For example, it will be difficult and inconvenient to organize long polling with this approach.
- The complexity of implementation and debugging. Breaking the sequential processing into stages so that performance is high may not be easy. Debugging a program in which each request is processed alternately in several threads running in parallel is more difficult than sequential processing.
- An interesting example of pipelining was described in the highload report 2018 The Evolution of the Architecture of the Moscow Exchange Trading and Clearing System
Conveyor processing is widely used, but most often the links are separate components in independent processes that exchange messages, for example, through a message queue or database.
A brief summary of the approaches considered:
- Synchronous processing.
A simple approach, but very limited in scalability, both in terms of rps and the number of simultaneous connections. It does not allow using multiple CPU cores at the same time.
- New process for each request.
The high costs of creating processes. The approach does not allow to scale effectively by the number of simultaneous connections, it has difficulties when using shared memory. It is quite suitable for long-running queries with a small number of simultaneous connections. It has properties that are useful for some applications (increased reliability, access control at the OS level).
- New thread for each request.
The problems are the same as in the previous approach, but it makes it easy to use shared memory. It has a similar scope to the previous approach, but it lacks some of its useful properties.
- Pull processes / threads.
Compared to the two previous approaches, it avoids the costs of creating processes / threads. The most commonly used approach for building network services. It scales well by rps and number of cores used. Good for handling a large number of short requests. Badly scaled by the number of simultaneous connections.
- Event-oriented processing (reactor pattern).
It scales well with rps and the number of simultaneous connections. It is more difficult to use because of the asynchronous programming style, difficult floating errors are possible. Scaling by the number of used CPU cores is difficult
- Half sync / half async.
It scales well with rps and the number of simultaneous connections. Depending on the implementation, it scales well with the CPU cores (Golang) or does not scale at all (Python). Spends much less resources on request than the approach process (flow) on request. It is programmed in a synchronous style, unlike the reactor pattern, but the same floating errors are possible as in the reactor pattern.
- Conveyor processing.
Allows you to achieve high performance, but difficult to implement approach. Not suitable for all types of requests (for example, long polling will be difficult to do).
The list above is not exhaustive, but it contains basic approaches to processing requests.
I appeal to the reader: what approaches do you use? What are the pros and cons, features of their work did you learn from your own experience?
- Related articles:
- Event-oriented approach:
- Comparing approaches based on streams and events:
- Half sync / half async:
- Green streams:
- Conveyor processing: