BillingMan December 16, 2015 at 14:28

Open Source Application Architecture: How nginx Works

Transfer

We at Later are engaged in the creation of billing for telecom operators and tell Habré about the development of our product, as well as publish interesting technical translation materials. And today we bring to your attention an adapted translation of one of the chapters of the book “ Architecture of open-source applications ”, which describes the prerequisites for the appearance, architecture and organization of work of the popular nginx web server.

Multithreading value

Nowadays, the Internet has penetrated everywhere, and it is very difficult to imagine that even 10-15 years ago, the global network was much less developed. The Internet has evolved from simple clickable HTML websites running on the NCSA and Apache web servers to a constantly working communication environment that is used by billions of people around the world. The number of devices constantly connected to the network is growing, and the Internet landscape is changing, contributing to the flow of entire sectors of the economy online. Online services are becoming more complex, and their successful operation requires the ability to instantly receive the necessary information. The security aspects of online businesses have also changed significantly. Therefore, the current sites are much more complicated than before, and in the general case, much more engineering efforts are required to ensure their stability and scalability.

One of the main challenges for site architects at all times has been multithreading. Since the beginning of the web services era, multithreading has been steadily increasing. Today, a popular site can simultaneously serve hundreds of thousands and even millions of users, and this will not surprise anyone. Not so long ago, multithreading was needed to work with slow ADSL or dial-up connections. Now multithreading is needed to work with mobile devices and new application architectures that require a constant and fast connection - the client should receive updates on tweets, news, information from the social media feed, etc. Another important factor affecting multithreading is the changed behavior of browsers, which open four to six simultaneous connections to the site to speed up loading.

Imagine a simple Apache server that generates short responses of 100 KB - a simple web page with text or images. It may take a fraction of a second to generate and render a page, but it will take 10 seconds for the client to transfer it with a bandwidth of 80 kbit / s. The web server will be able to “pull out” 100 kilobytes of content relatively quickly, and then it will slowly forward them to the client for 10 seconds. Now imagine that you have 1000 concurrently connected clients that have requested the same content. If each client requires the allocation of 1 MB of additional memory, then a total of 1 gigabyte of memory is required in order to send 1000 clients 100 kilobytes of content.

In the case of persistent connections, the problem of multithreading processing becomes even more acute, because in order to avoid delays associated with establishing a new HTTP connection, clients will remain connected.

In order to handle the increased load volumes associated with the expansion of the Internet audience and, as a result, the increase in multithreading levels, the foundation of the site’s functionality should consist of very effective blocks. In this equation, all components are important - hardware (CPU, memory, disks), network capacity, application architecture and data storage - however, client requests are received and processed by the web server. Therefore, it should be capable of non-linear scaling with a growing number of simultaneous connections and requests processed per second.

Apache issues

The Apache web server is still prominent on the Internet. The roots of this project go back to the early 1990s, and initially its architecture was sharpened by the then existing systems and hardware, as well as the general degree of development of the Internet. Then the website, as a rule, was a separate physical server on which a single instance of Apache was running. By the beginning of the 2000s, it became apparent that a model with a single physical server could not be effectively replicated to meet the needs of growing web services. Despite the fact that Apache is a good platform for further development, it was originally designed to create a copy of the web server for each new connection, which in modern conditions does not allow achieving the necessary scalability.

Ultimately, Apache developed a powerful ecosystem of third-party services, which allows developers to get almost any tools for creating applications. But everything has a price, and in this case, for a large number of tools for working with a single software product, you need to pay less scaling capabilities.

Traditional thread or process-based models for processing simultaneous connections mean processing each connection using a separate process or stream and blocking I / O. Depending on the application, this approach can be extremely inefficient in terms of cost of processor resources and memory. Creating a separate process or thread requires preparing a new startup environment, including allocating stack and heap memory, as well as creating a new execution context. All this takes extra processor time, which can ultimately lead to performance problems due to excessive context switches. All these problems are fully manifested when using web servers of the old architecture, such as Apache.

A practical comparison of the work of the two most popular web servers is published on Habré in this material .

Nginx web server architecture overview

From the very beginning of its existence, nginx had to play the role of a specialized tool to achieve higher performance and economical use of server resources, while simultaneously allowing the dynamic growth of the website. As a result, nginx received an asynchronous, modular, event-oriented architecture.

Nginx actively uses multiplexing and event notification, assigning specific tasks to individual processes. Connections are handled through an efficient execution loop using a certain number of single-threaded processes called workers. Within each worker, nginx can handle many thousands of concurrent connections and requests per second.

Code structure

Worker in nginx includes a kernel and functional modules. The nginx kernel is responsible for maintaining the execution cycle and execution of the appropriate sections of the module code at each step of the process. Modules provide most of the functionality of the application layer. The modules also read and write to the network and storage, transform the content, perform outgoing filtering and, in the case of proxy mode, send requests to upstream servers.

The modular architecture of nginx allows developers to expand the set of functions of a web server without the need to modify the code of its core. There are several varieties of nginx modules - kernel modules, event modules, phase handlers, protocols, filters, load balancers, variable handlers, etc. At the same time, nginx does not support dynamically loaded modules, that is, they are compiled together with the kernel at the assembly creation stage. The developers plan to add the functionality of downloadable modules in the future.

To organize various activities related to receiving, processing and managing network connections and downloading content, nginx uses notification mechanisms and several mechanisms to improve disk I / O performance on Linux, Solaris, and BSD systems, including kqueue, epoll, and event ports.

A high-level view of the nginx architecture is shown in the figure below:

Worker Model

As noted above, nginx does not create a process or thread for each connection. Instead, a special worker processes the reception of new requests from a common "listening" socket and starts a highly efficient execution cycle inside each worker process - this allows you to process thousands of connections for one worker. There are no special mechanisms for distributing connections between different worker processes in nginx, this work is performed in the kernel of the OS. During the boot process, a set of listening sockets is created, and then the worker constantly receives, reads and writes to sockets during processing of HTTP requests and responses.

The most difficult part of nginx worker code is the description of the run loop. It includes all kinds of internal calls and actively uses the concept of asynchronous task processing. Asynchronous operations are implemented through modularity, event alerts, and the widespread use of callback functions and modified timers. The main goal of all this is to get as far as possible from using locks. The only case when nginx can use them is the situation is insufficient for the disk storage performance worker process to work.

Since nginx does not create processes and threads for each connection, in the vast majority of cases the web server is very conservative and extremely efficient with memory. In addition, it saves processor cycles, because in the case of nginx there is no pattern for the constant creation and destruction of processes and threads. Nginx checks the status of the network and storage, initializes new connections, adds them to the run loop, and then asynchronously processes it to the “victorious end”, after which the connection is deactivated and excluded from the loop. Thanks to this mechanism, as well as the thoughtful use of system calls and the quality implementation of supporting interfaces such as memory allocators (pool and slab), nginx allows you to achieve low or medium CPU load even in case of extreme loads.

Using multiple worker processes to handle connections also makes the web server highly scalable for working with multiple cores. Effective use of multi-core architectures is ensured by the creation of one worker-process for each core, and also avoids blocking and thread thrashing. Resource control mechanisms are isolated inside single-threaded worker processes - this model also contributes to more efficient scaling of physical storage devices, allows for higher disk utilization and avoids blocking disk I / O. As a result, server resources are used more efficiently, and the load is distributed between several worker-processes.

For different processor and disk loading patterns, the number of nginx worker processes can vary. Web server developers recommend that system administrators try different configuration options to get the best results in terms of performance. If the pattern can be described as “intensive CPU load” - for example, in the case of processing a large number of TCP / IP connections, performing compression, or using SSL, then the number of “workers” should coincide with the number of cores. If the load falls mainly on the disk system - for example, when it is necessary to load and unload large volumes of content from the storage - then the number of worker processes can be one and a half to two times the number of cores.

In future versions of the web server, nginx developers plan to solve the problem of disk I / O blocking situations. At the time of this writing, in case of insufficient storage performance when performing disk operations of a specific worker process, the ability to read or write may be blocked for it. To minimize this probability, various combinations of configuration file directives and existing mechanisms can be used - for example, the sendfile and AIO options usually can significantly increase storage performance.

Another problem with the existing worker process model is the limited support for embedded scripts. In the case of the standard version of nginx, only embedding Perl scripts is available. This situation is simply explained - the main problem is the likelihood that the embedded script will be blocked during the operation or unexpectedly terminate. In both cases, the worker process freezes, which can affect thousands of connections at once.

Nginx process roles

Nginx starts several processes in memory - one master process and several "workers". There are also several service processes — for example, the manager and cache loader. In nginx versions 1.x, all processes are single-threaded. All of them use memory sharing mechanisms to interact with each other. The master process starts as root. Service and worker processes work without superuser privileges.

The master process is responsible for the following tasks:

reading and configuration validation;
create, bind, and close sockets;
start, interruption and support of a configured number of worker-processes;
reconfiguration without interruption of the service;
control of constant binary updates (launch of new "binaries" and rollback to the previous version if necessary);
reopening log files;
compilation of embedded Perl scripts.

The internal device nginx was described on Habr in this article

Worker processes accept and process connections from clients, provide reverse proxy and filtering functionality, and also do almost everything that nginx should do. In general, in order to track the current state of a web server, the system administrator needs to look at the workers, because it is they [state] that best reflect it.

The cache loader process is responsible for checking the items located in the cache on the disk, as well as for updating the metadata stored in the memory. The bootloader prepares nginx instances for working with files already stored on disk. It goes through the directories, examines the metadata of the content in the cache, updates the necessary elements in shared memory, and then exits.

The cache manager is primarily responsible for monitoring the relevance of the cache. During normal operation of the web server, it is in memory, and in the event of a failure, the master process restarts it.

A brief overview of caching in nginx

In nginx, caching is implemented in the form of a hierarchical data store in the file system. Cache keys can be configured, and you can control what gets into it using various query parameters. Cache keys and metadata are stored in shared memory segments, which are accessible by workers, as well as by the loader and cache manager. At the moment, nginx does not have caching of files in the internal memory, except for those optimization opportunities that are available when working with the mechanisms of the virtual file system of the OS. Each cached response is placed in a separate file system file. The hierarchy is controlled using the nginx configuration directives. When the response is written to the cache directory structure, the path and file name are retrieved from the MD5 hash of the proxy URL.

The process of placing content in the cache is as follows: when nginx reads the response from the upstream server, the content is first written to a temporary file outside the cache directory structure. When the web server finishes processing the request, it changes the name of the temporary file and moves it to the cache directory. If the temporary files directory is located on another file system, the file will be copied, therefore it is recommended to place the temporary and cache directories on the same file system. In addition, from a security point of view, deleting files from the cache will also be a good solution if you need to clear files, since there are third-party extensions for nginx that can provide remote access to cached content.

Nginx configuration

Igor Sysoev was inspired by the experience with Apache to create the nginx configuration system. The developer believed that a web server needed a scalable configuration system. And the main problem of scalability arose when it was necessary to support a large number of complex configurations with many virtual servers, directories and data sets. Supporting and scaling a relatively large web infrastructure can turn into hell.

As a result, nginx configuration was designed in such a way as to simplify the routine operations of supporting the web server and provide tools for further expansion of the system.

The nginx configuration is stored in several text files, which are usually located in the / usr / local / etc / nginx or / etc / nginx directories. The main configuration file is usually called nginx.conf. To make it more readable, parts of the configuration can be split into different files, which are then included in the main one. It is important to note that nginx does not support .htaccess files - all configuration information should be located in a centralized set of files.

The initial reading and verification of configuration files is carried out by the master process. The compiled configuration form for reading is available to worker processes after they are selected from the master process. Configuration structures are automatically shared by virtual memory management mechanisms.

There are several different contexts for the blocks and directives main, http, server, upstream, location (mail, for mail proxy). For example, you cannot put a location block in a main directive block. Also, in order not to add unnecessary complexity, nginx does not have a “global web server” configuration. As Sysoev says:

Locations, directories, and other blocks in the global web server configuration are something that I never liked in Apache, so they never appeared in nginx.

The syntax and formatting of the nginx configuration follows the C code style standard (“C-style convention”). Although some nginx directives reflect certain parts of Apache's configuration, the overall configuration of the two web servers is very different. For example, nginx supports rewriting rules, and in the case of Apache, for this, the administrator will have to manually adapt the legacy configuration. The implementation of the rewriting “engine” also differs.

Nginx also supports several useful original mechanisms. For example, variables and the try_files directive. In nginx, variables are used to implement a powerful mechanism for controlling the web server’s run-time configuration. They can be used with various configuration directives to provide additional flexibility in describing the conditions for processing requests.

The try_files directive was originally created as a replacement for conditional if statements, as well as for quickly and efficiently matching different URLs and content.

Nginx internal device

Nginx consists of a kernel and a number of modules. The kernel is responsible for creating the foundation of the web server, the operation of the functionality of web and reverse proxies. It is also responsible for using network protocols, building a launch environment, and ensuring seamless communication between different modules. However, most of the functions associated with protocols and applications are implemented using modules, not the kernel.

Connections are handled by nginx using a pipe or module chain. In other words, for each operation there is a module that performs the necessary work - for example, compression, content modification, server inclusion, interaction with external servers via FastCGI or uwsgi protocols, or communication with memcahed.

There are a couple of modules that are located between the kernel and the “functional” modules - these are http and mail modules. They provide an additional level of abstraction between the core and low-level components. With their help, processing of sequences of events related to a certain network protocol such as HTTP, SMTP or IMAP is implemented. Together with the kernel, these high-level modules are responsible for maintaining the correct order of calls of the corresponding functional modules. Currently, the HTTP protocol is implemented as part of the http module, but in the future, developers plan to separate it into a separate functional module - this is dictated by the need to support other protocols (for example, SPDY ).

Most existing modules complement the nginx HTTP functionality, but event and protocol modules are also used to work with mail. Event modules provide an event notification mechanism for various operating systems - for example, kqueue or epoll. The choice of module used by nginx depends on the build configuration and the capabilities of the operating system. Protocol modules allow nginx to work through HTTPS, TLS / SSL, SMTP, POP3, and IMAP.

This is what a typical HTTP request processing loop looks like:

The client sends an HTTP request.
The kernel, in accordance with the configured location for the nginx request, selects the desired phase handler.
If proxy functionality is enabled, the load balancer selects a higher server for proxy purposes.
The phase handler terminates and passes the output buffer to the first filter.
The first filter passes the output to the second filter.
The second filter passes the output to the third filter (and so on).
The final response is sent to the client.

The call of modules in nginx can be configured, it is carried out using callbacks with pointers to executable functions. The downside here is that if the developer wants to write his own module, then he will need to clearly state how and where he should be launched. For example, here are the points where this can happen:

Before reading and processing the configuration file.
At the time of completion of initialization of the main configuration.
After initialization of the server (host / port).
When the server configuration merges with the main one.
When the master process starts or ends.
At the start or end of a new worker process.
At the time of processing the request.
In the process of filtering the header and body of the response.
When choosing the initial and reinitialization of the request to the upstream server.
In the process of processing the response from the upstream server.
At the time of completion of interaction with this server.

Inside the worker, the sequence of actions leading to the processing cycle where the response is generated is as follows:

Start ngx_worker_process_cycle ().
Events are processed using OS mechanisms (for example, epoll or kqueue).
Events are accepted, corresponding actions are sent.
The process / proxy requests the header and body.
The response content (header, response) is generated and sent to the client.
The request is finalized.
Timers and events are reinitialized.

A more detailed description of the processing of the HTTP request can be represented as follows:

Initialization of request processing.
Header processing.
Body treatment.
Call the appropriate handler.
Passage of processing phases.

In the process of processing the request goes through several phases. On each of them the corresponding handlers are called. Usually they perform four tasks: get the configuration of the location, generate the appropriate response, send the header, and then the body. The handler has one argument: a specific structure that describes the request. The request structure contains a large amount of useful information: for example, the request method, URL and title.

After reading the header of the HTTP request, nginx looks at the associated virtual server configuration. If a virtual server is found, then the request goes through six phases:

Server rewrite phase.
The phrase search location (location).
Overwrite location.
The access control phase.
The work phase of try_files.
Logging phase.

In the process of creating content in response to a request, nginx passes it to various content handlers. First, the request can go to the so-called unconditional handlers like perl, proxy_pass, flv, mp4. If the request does not fit any of these content handlers, then it is sent along the chain to the following handlers: random index, index, autoindex, gzip_static, static.

If a specialized module like mp4 or autoindex does not fit, then the content is considered as a directory on the disk (that is, as static) and the static content handler is responsible for it.

After that, the content is transmitted to filters that work according to a specific scheme. The filter receives a call, starts working, calls the next filter, and so on until the last filter in the chain is called. There are header and body filters. The operation of the header filter consists of three main steps:

Determining the need for action in response to a request.
Processing request.
Call the next filter.

Body filters transform generated content. Among their possible actions:

Server inclusions.
XSLT filtering.
Image filtering (for example, resizing pictures on the fly).
Encoding modification.
Gzip compression.
Chunked encoding

After passing through the filter chain, the response is transmitted to the writer. There are also two special filters - copy and postpone. The first of them is responsible for filling the memory buffers with relevant content of the responses, and the second is used for subqueries.

Subqueries are a very important and very powerful mechanism for processing requests and responses. Using subqueries, nginx can return the result for different URLs requested by the client. Some web frameworks use internal redirects to solve this problem, but nginx goes further - filters not only execute different subqueries and combine their output into one common answer, but subqueries can also be hierarchical and nested. That is, a subquery can execute its own subquery (“sub-subquery”), and that, in turn, can initiate a “sub-subquery”.

Subqueries can point to files on disk, other handlers, or upstream servers. They are extremely useful for inserting additional content when using data from the initial request. For example, an SSI module (server side include) uses a filter to parse the contents of the returned document, and then replaces include directives with content from the specified URLs. In the same way, you can create a filter that turns the entire contents of a document into a URL, and then adds the new document itself to the URL.

Also in nginx there are load balancing modules and upstream modules. The latter are used to prepare content for sending to a higher server and receive responses from it. In this case, no output filter calls are made. The upstream module sets up callbacks that need to be called when the upstream server is ready to write or read. There are callbacks for implementing the following functionality:

Organization of the request buffer for sending to the upstream server.
Re-initialization of the connection to the server (it occurs before creating the request).
Processing the first bits of the response and storing pointers to data received from the server.
Interruption of requests (occurs when the client unexpectedly disconnects).
Finalizing the request after nginx has finished reading from the upstream server.
Trimming the response body (removing the trailer).

Load balancing modules are added to the proxy_pass handler to provide the possibility of choosing a higher server - if there are more than one. The mechanisms of working with upstream servers and load balancing allow us to identify failed servers and redirect requests to functioning nodes.

Nginx also has interesting modules that provide an additional set of variables for use in the configuration file. In nginx, variables are mainly created and updated in various modules, but there are two modules that are entirely dedicated to variables: geo and map. The geo module is used to make it easy to track clients by IP address. It can create random variables that depend on the IP address of the client. The second module, map, allows you to create variables from other variables, which makes it easy to map host names and other runtime variables.

Memory allocation mechanisms in nginx worker processes have been built around Apache experience. A high-level description of working with memory in nginx is as follows: for each connection, the necessary memory buffers are dynamically allocated, linked and used to store and change the header and body of the request or response, and then freed when the connection is completed. Nginx tries to avoid copying data in memory to the maximum, most of them are processed using variable pointers without calling memcpy.

The task of managing memory allocation is solved using a special nginx pool allocator. Shared memory zones are used to receive mutex, cache metadata, SSL session cache, and bandwidth management information (limits). To control memory allocation, nginx has a slab allocator. The safe use of shared memory is achieved through locking mechanisms (mutexes and semaphores). To organize complex data structures, nginx uses an implementation of red-black trees. They are used for storing cache metadata in shared memory, tracking non-regex locations and for some other tasks.

Additional materials on working with nginx:

Tags: