NGINX - History of rebirth under Windows
Since here we have a “week” of nginx, for example here or here , I’ll try and make my contribution, so to speak. It will be about nginx 4 windows, namely about a more or less official assembly for this proprietary, some not very favorite platform.
Why Windows. It's simple, in the corporate sector of Windows on the server, and on workstations - often a required program. And from these requirements for the platform, for example, in the ultimatum form voiced by the client, you can’t get anywhere.
And since we have Windows, but Idon’t feel like tormenting myself with IIS, apache and others like them, if I want to use my favorite tools, and nginx definitely refers to them, then we sometimes have to put up with even some limitations on this platform. Rather, I had to ...
Although it should be noted that even with these limitations, nginx will give odds to almost any web server under windows for many factors, including stability, memory consumption, and most importantly performance.
I hasten to share the good news right away - there are more restrictions critical for high performance, when using nginx under windows, there are practically no, and the last of the critical ones, with a high probability, will also disappear soon. But in order ...
The known problems of nginx 4 windows are described here , namely:
I changed the order a bit, because it was in this sequence that I understood these limitations, sorted so to speak “historically”.
In fact, this is not true, or rather, not quite true - from time immemorial, nginx could be built under Windows without this restriction - you just had to determine at the build stage
For example, for VS, adding a directive
Until recently, all these functions and modules did not really work under Windows, starting with the x64 version or where, by default, the entire system works with ASLR turned on.
Moreover, disabling ASLR for nginx does not change anything, because functions for working with shared memory are wired deep into kernel, i.e. ASLR (and with it probably DEP, for some reason did not work with it) must be disabled for the entire system.
This is actually not a small list of functionality: Cache, any zones, respectively, limit_req, etc. etc. By the way, without the support of shared memory, it would be much more difficult to remove the 3rd restriction, i.e. implement support for multiple workers under windows.
I will not bore the reader as I struggled with this, but together with Max (thanks mdounin) we did it to the release version. A little about this, who are interested, see under the spoiler or in the sources of hg.nginx.org or github ...
Those. officially this restriction is no longer valid with Release 1.9.0 dated Apr 28, 2015:
Nginx has a master process and child processes called workers or workers.
Under windows, nginx can run several workflows, i.e. specifying "
This limitation is related to the technical implementation of winsock, and the only way to get a distributed listener connection for all work processes on Windows is to clone the socket from the master process, i.e. use inherited handle from socket from it.
Those who are interested in implementation details can see them under the spoiler or in the source, so far only at my github .
As a result, nginx under windows now starts N “full-fledged”, in terms of “listening”, and most importantly, establishing a connection, workflows that process incoming connections really in parallel.
This fix really still lies in the "pool request" (I sent the changeset to nginx-dev), but you can already try it for example by downloading from my github and building it yourself under windows. If there are those who wish, I will post a binary somewhere.
I tortured my hardware for quite some time, chasing it with tests and under load "scripts" - the result, all the workers are more or less evenly loaded and really work in parallel. I also tried on the fly (reload) to reboot nginx and randomly “killed” some workers by simulating the “crash” of the latter - everything works without the slightest criticism.
So far, the only "flaw" has appeared, IMHO - if you run
By the way, my experience so far says that
Well, as withoutbeautiful plates with performance comparison numbers , before (the first column is marked ** NF) and after.
The test is done on Windows7 - i5-2400 cpu @ 3.10GHz (4 core).
Request: statics, 452 bytes (+ header) - small gif-icons.
And may nginx be with you under windows too.
Why Windows. It's simple, in the corporate sector of Windows on the server, and on workstations - often a required program. And from these requirements for the platform, for example, in the ultimatum form voiced by the client, you can’t get anywhere.
And since we have Windows, but I
Although it should be noted that even with these limitations, nginx will give odds to almost any web server under windows for many factors, including stability, memory consumption, and most importantly performance.
I hasten to share the good news right away - there are more restrictions critical for high performance, when using nginx under windows, there are practically no, and the last of the critical ones, with a high probability, will also disappear soon. But in order ...
The known problems of nginx 4 windows are described here , namely:
- A workflow can handle no more than 1024 concurrent connections.
- Cache and other modules that require shared memory support do not work under Windows Vista and later because address space randomization is enabled on these versions of Windows.
- Although it is possible to run several workflows, only one of them really works.
I changed the order a bit, because it was in this sequence that I understood these limitations, sorted so to speak “historically”.
1024 concurrent connections
In fact, this is not true, or rather, not quite true - from time immemorial, nginx could be built under Windows without this restriction - you just had to determine at the build stage
FD_SETSIZE
equal to the number of connections you need. For example, for VS, adding a directive
--with-cc-opt=-DFD_SETSIZE=10240
, the nginx worker will be able to manage 10K simultaneous connections, if you specify in the configuration worker_connections 10240;
.Cache and other modules requiring shared memory support
Until recently, all these functions and modules did not really work under Windows, starting with the x64 version or where, by default, the entire system works with ASLR turned on.
Moreover, disabling ASLR for nginx does not change anything, because functions for working with shared memory are wired deep into kernel, i.e. ASLR (and with it probably DEP, for some reason did not work with it) must be disabled for the entire system.
This is actually not a small list of functionality: Cache, any zones, respectively, limit_req, etc. etc. By the way, without the support of shared memory, it would be much more difficult to remove the 3rd restriction, i.e. implement support for multiple workers under windows.
I will not bore the reader as I struggled with this, but together with Max (thanks mdounin) we did it to the release version. A little about this, who are interested, see under the spoiler or in the sources of hg.nginx.org or github ...
A bit of theory ...
Shared memory itself can also be used with randomization of address space. One doesn’t seem to interfere with the other, just when ASLR is turned on, you are almost guaranteed to get a pointer to the “same memory” in a different process, but under a different address. This is actually not critical, as long as the contents of this space itself do not contain direct pointers, aka pointer. Those. offset pointers relative to the start address of shmem are valid, but not direct pointers as they are.
Accordingly, without rewriting all the functionality that works with pointer inside the shared mem in nginx, there is only one option, todeceive to force windows to issue a link to shmem under a constant address for all work processes. Well, then everything is not very difficult, in fact.
You can read the start of the discussion about this here . Maxim, by the way, fixed the problem I missed (remapping), which sometimes occurs after rebooting the workers (reload on the fly).
Viva open source!
Accordingly, without rewriting all the functionality that works with pointer inside the shared mem in nginx, there is only one option, to
You can read the start of the discussion about this here . Maxim, by the way, fixed the problem I missed (remapping), which sometimes occurs after rebooting the workers (reload on the fly).
Viva open source!
Those. officially this restriction is no longer valid with Release 1.9.0 dated Apr 28, 2015:
Feature: shared memory can now be used on Windows versions with
address space layout randomization.
Only one workflow really works.
Nginx has a master process and child processes called workers or workers.
Under windows, nginx can run several workflows, i.e. specifying "
worker_processes 4;
" in the configuration will force the wizard to start four child workflows. The problem is that only one of them, "stealing" the listener connection from the master (using SO_REUSEADDR), will really listen to this socket, i.e. do accept incoming connections. As a result, other workers - no incoming connections - no work. This limitation is related to the technical implementation of winsock, and the only way to get a distributed listener connection for all work processes on Windows is to clone the socket from the master process, i.e. use inherited handle from socket from it.
Those who are interested in implementation details can see them under the spoiler or in the source, so far only at my github .
More details
To begin with, even if you start child processes (
Why does this happen, one quote from MSDN opens - DuplicateHandle :
As a result, in order to inform the child process of the information received by the wizard from WSADuplicateSocket, which is necessary to create a clone socket in the workflow, we have two options, either use something like IPC, for example, as described in MSDN - WSADuplicateSocket , or transfer it via shared memory (Fortunately, we have already fixed it above).
I decided to use the second option, because I believe that this is the least time-consuming of the two and the fastest way to implement connection inheritance.
Below are the changes in the algorithm for starting workflows under windows (marked
If necessary, here is a link to a discussion about the fix, so that it will be.
CreateProcess
) using bInheritHandle=TRUE
, and setting SECURITY_ATTRIBUTES::bInheritHandle
TRUE to create a socket, you most likely will not succeed, i.e. in the workflow using this handle, you will get “failed (10022: An invalid argument was supplied)”. And having “successfully” duplicated this socket with the help DuplicateHandle
, the duplicated handle will also not accept any function working with sockets (probably with error 10038 - WSAENOTSOCK). Why does this happen, one quote from MSDN opens - DuplicateHandle :
Sockets. No error is returned, but the duplicate handle may not be recognized by Winsock at the target process. Also, using DuplicateHandle interferes with internal reference counting on the underlying object. To duplicate a socket handle, use the WSADuplicateSocket function.The problem is that to duplicate handle using the WSADuplicateSocket, you need to know the pid of the process in advance, i.e. this cannot be done before the process is started.
As a result, in order to inform the child process of the information received by the wizard from WSADuplicateSocket, which is necessary to create a clone socket in the workflow, we have two options, either use something like IPC, for example, as described in MSDN - WSADuplicateSocket , or transfer it via shared memory (Fortunately, we have already fixed it above).
I decided to use the second option, because I believe that this is the least time-consuming of the two and the fastest way to implement connection inheritance.
Below are the changes in the algorithm for starting workflows under windows (marked
*
):- The master process creates all listener sockets;
- [cycle] The master process creates a workflow;
*
[win32] the master calls a new functionngx_share_listening_sockets
: for each listener socket, information is requested (for inheritance) specifically for this new worker, ("cloned" through WSADuplicateSocket for pid), which will be stored in shared memory - shinfo (protocol structure);- The master process waits until worker sets a ready event - event “worker_nnn”;
*
[win32] The workflow performs a new functionngx_get_listening_share_info
to obtain shinfo inheritance information, which will be used to create a new socket descriptor for the shared listener socket of the master process;*
[win32] The workflow creates all listener sockets using the shinfo information from the master process;- The worker process sets an event - event "worker_nnn";
- The master process stops waiting, and creates the next workflow by repeating [cycle].
If necessary, here is a link to a discussion about the fix, so that it will be.
As a result, nginx under windows now starts N “full-fledged”, in terms of “listening”, and most importantly, establishing a connection, workflows that process incoming connections really in parallel.
This fix really still lies in the "pool request" (I sent the changeset to nginx-dev), but you can already try it for example by downloading from my github and building it yourself under windows. If there are those who wish, I will post a binary somewhere.
I tortured my hardware for quite some time, chasing it with tests and under load "scripts" - the result, all the workers are more or less evenly loaded and really work in parallel. I also tried on the fly (reload) to reboot nginx and randomly “killed” some workers by simulating the “crash” of the latter - everything works without the slightest criticism.
So far, the only "flaw" has appeared, IMHO - if you run
netstat /abo | grep LISTEN
then you will see only the master process in the list of "listeners", although in reality it just never establishes a connection, only its child workflows. By the way, my experience so far says that
accept_mutex
for the windows platform you probably need to disable " accept_mutex off;
", because at least on my test systems, with it turned on accept_mutex
they worked significantly slower than with it turned off. But I think everyone should check this experimentally (because it depends on a bunch of parameters, such as the number of cores, workers, keep-alive connections, etc., etc.). Well, as without
The test is done on Windows7 - i5-2400 cpu @ 3.10GHz (4 core).
Request: statics, 452 bytes (+ header) - small gif-icons.
Workers x Concur. | 1 x 5 ** NF | 2 x 5 | 4 x 5 | 4 x 15 |
---|---|---|---|---|
Transactions | 5624 hits | 11048 hits | 16319 hits | 16732 hits |
Availability | 100.00% | 100.00% | 100.00% | 100.00% |
Elapsed time | 2.97 secs | 2.97 secs | 2.97 secs | 2.96 secs |
Data transferred | 2.42 MB | 4.76 MB | 7.03 MB | 7.21 MB |
Response time | 0.00 secs | 0.00 secs | 0.00 secs | 0.00 secs |
Transaction rate | 1893.60 trans / sec | 3719.87 trans / sec | 5496.46 trans / sec | 5645.07 trans / sec |
Throughput | 0.82 MB / sec | 1.60 MB / sec | 2.37 MB / sec | 2.43 MB / sec |
Concurrency | 4.99 | 4.99 | 4.99 | 14.92 |
Successfulful transactions | 5624 | 11048 | 16319 | 16732 |
Failed transactions | 0 | 0 | 0 | 0 |
Longest transaction | 0.11 | 0.11 | 0.11 | 0.11 |
Shortest transaction | 0.00 | 0.00 | 0.00 | 0.00 |
And may nginx be with you under windows too.