Benchmark of HTTP servers (C / C ++) in FreeBSD



We compared the performance of the HTTP server cores built using seven C / C ++ libraries, as well as (for educational purposes), other ready-made solutions in this area (nginx and node.js).

An HTTP server is a complex and interesting mechanism. It is believed that bad programmer, have not written a compiler, I would replace the "compiler" on "the HTTP-server": this parser, and work with the network and asynchronous multi-threading, and so much more ... .

Tests on all possible parameters (returning statics, dynamics, various encryption modules, proxies, etc.) are a task of more than one month of painstaking work, so the task is simplified: we will compare the performance of the cores. The core of the HTTP server (like any network application) is the socket event manager and some primary mechanism for processing them (implemented as a pool of threads, processes, etc.). This also includes the HTTP packet parser and response generator. At first glance, everything should come down to testing the capabilities of one or another system mechanism for processing asynchronous events (select, epoll, etc.), their meta-wrappers (libev, boost.asio, etc.) and the OS kernel, however, a specific implementation as a turnkey solution gives a significant difference in performance.

I implemented my version of the HTTP server onlibev . Of course, support for a small subset of the requirements of the notorious rfc2616 is implemented (it is unlikely that it will be fully implemented by at least one HTTP server), only the necessary minimum to meet the requirements for the participants of this test

  1. Listen to requests on the 8000th port;
  2. Check Method (GET);
  3. Check the path in the request (/ answer);
  4. The answer should contain:
                HTTP / 1.1 200 OK
                Server: bench
                Connection: keep-alive
                Content-Type: text / plain
                Content-Length: 2
                42
            

  5. On any other method \ path - an answer should be returned with an error code 404 (page not found).

As you can see - no extensions, file accesses on the disk, gateway interfaces, etc. - everything is simplified as much as possible.
In cases where the server does not support keep-alive connections (by the way, only cpp-netlib was distinguished by this), testing was carried out respectively. mode.

Background


Initially, the task was to implement an HTTP server with a load of hundreds of millions of hits per day. It was assumed that there would be a relatively small number of customers generating 90% of requests, and a large number of customers generating the remaining 10%. Each request needs to be sent further to several other servers, to collect responses and return the result to the client. The entire success of the project depended on the speed and quality of the response. Therefore, it was simply not possible to take and use the first available ready-made solution. It was necessary to get answers to the following questions:
  1. Is it worth it to reinvent your bicycle or use existing solutions?
  2. Is node.js suitable for highly loaded projects? If so, then throw out the thickets of C ++ code and rewrite everything in 30 lines to JS.

There were less significant questions, for example, does HTTP keep-alive affect performance? (a year later, the answer was voiced here - it affects, and very significantly).

Of course, at first my bike was invented, then node.js appeared (I learned about it two years ago), and then I wanted to find out: how much more effective are the existing solutions than your own, was it wasted time? Actually, this post appeared.

Training


Iron
  • Processor: CPU: AMD FX (tm) -8120 Eight-Core Processor
  • Network: localhost (why - see TODO)

Software
  • OS: FreeBSD 9.1-RELEASE-p7

Tuning
Usually, in the load testing of network applications, it is customary to change the following standard set of settings:
/etc/sysctl.conf
kern.ipc.somaxconn = 65535
net.inet.tcp.blackhole = 2
net.inet.udp.blackhole = 1
net.inet.ip.portrange.randomized = 0
net.inet.ip.portrange.first = 1024
net.inet .ip.portrange.last = 65535
net.inet.icmp.icmplim = 1000

/boot/loader.conf
kern.ipc.semmni = 256
kern.ipc.semmns = 512
kern.ipc.semmnu = 256
kern.ipc.maxsockets = 999999
kern.ipc.nmbclusters = 65535
kern.ipc.somaxconn = 65535
kern.maxfiles = 999999
kern.maxfilesperproc = 999999
kern.maxvnodes = 999999
net.inet.tcp.fast_finwait2_recycle = 1

However, in my testing they did not lead to an increase in performance, and in some cases even led to a significant slowdown, so in the final tests no changes to the settings in the system were made (i.e. all the default settings, the GENERIC kernel).

Members


Library
NameVersionEventsKeep-alive supportMechanism
cpp-netlib0.10.1Boost.asionotmultithreaded
hand-made11/11/30libevYesmultiprocess (one thread per process), asynchronous
libevent2.0.21libeventYessingle-threaded *, asynchronous
mongoose5.0selectYessingle-threaded, asynchronous, with a list (more)
onion0.5libevYesmultithreaded
Pion network library0.5.4Boost.asioYesmultithreaded
POCO C ++ Libraries1.4.3selectYesmulti-threaded (separate thread for incoming connections), with a queue (more)

Ready-made solutions
NameVersionEventsKeep-alive supportMechanism
Node.js10/10/17libuvYescluster module (multiprocessing)
nginx1.4.4epoll, select, kqueueYesmultiprocessing

* redone for tests according to the scheme “multiprocess - one process one thread”

Disqualified
NameCause
nxwebLinux only
g-wanLinux only (and generally ... )
libmicrohttpdconstant falls under loads
yieldcompilation errors
Ehscompilation errors
libhttpdsynchronous, HTTP / 1.0, does not allow changing headers
libebbcompilation errors, crashes

As a client we used an application from the developers of lighttpd - weighttpd . It was originally planned to use httperf as a more flexible tool, but it constantly crashes. In addition, weighttpd is based on libev, which is much better for FreeBSD than httperf with select. As the main test script (a wrapper over weighttpd with the calculation of resource consumption, etc.), we considered the gwan-ovsky ab.c , redone for FreeBSD, but later it was rewritten from scratch on Python (bench.py ​​in the application).

The client and server were running on the same physical machine.
As variable values ​​were used:
  • Number of server threads (1, 2, and 3)
  • Number of concurrently open customer requests (10, 100, 200, 400, 800)

In each configuration, 20-30 iterations were performed, 2 million requests per iteration.

results


In the first version of the article, gross violations were made in the testing methodology, as indicated in the comments by VBart and wentout users . So, in particular, the strict separation of tasks by processor cores was not used, the total number of server \ client threads exceeded the permissible norms. Also, the options affecting the measurement results (AMD Turbo Core) were not disabled, measurement errors were not indicated. The current version of the article uses the approach described here .

For servers running in single-threaded mode, the following results were obtained (maximum medians for server / client stream combinations were taken):
A placeNameClient flowsPercent timeInquiries
CustomSyst.Successful (in sec.)Unsuccessful (%)
1nginx40010101012100
2mongoose20012fifteen532550
3libevent2001633398820
4hand-made1002032385500
5onion102233292300
6POCO1025fifty209430
7pion102483165260
8node.js102317393740
9cpp-netlib1010018353620

Scalability:

In theory, if there were more cores, we would observe a linear increase in performance. Unfortunately, it is not possible to verify the theory - there are not enough nuclei.

frankly speaking, nginx surprised me - after all, in essence it is a ready-made, multifunctional, modular solution, and the results are an order of magnitude superior to highly specialized libraries. Respect.

mongoose is still damp, version 5.0 is not run-in, and the branch is in an active development stage.

cpp-netlib showed the worst result. Not only was it the only one that did not support HTTP keep-alive connections, it also crashed somewhere in the bowels of the boost, it was problematic to perform all iterations in a row. Definitely, the decision is crude, the documentation is outdated. Legal last place.

node.js already scolded here, I will not be so categorical, but the V8 is still sawing and sawing. What is this high-load solution that even without payload consumes resources so eagerly and gives 10-20% of the productivity of top test participants?

HTTP keep-alive on / off: if in a post the difference reached x2 times, then in my tests the difference was up to x10.

Accuracy according to ministat: No difference proven at 95.0% confidence.

Todo


  • benchmark in the "client and server on different machines" mode. You need to be careful - everything can run into network glands, and not only models of network cards, but switches, routers, etc. - the entire infrastructure between real machines. For starters, you can try a direct connection;
  • testing client HTTP API (organize as a server and proxy). The problem is that not all libraries provide APIs for implementing an HTTP client. On the other hand, some popular libraries (libcurl, for example) provide an exclusively client-side set of APIs;
  • use of other HTTP clients. httperf was not used for the above reasons, ab - for many reviews is outdated and does not hold real loads. Many have recommended. Here are a couple of dozen solutions, some of which should be compared;
  • A similar benchmark in a Linux environment. This should be an interesting topic (at least a new wave for holivar discussions);
  • run tests on the top Intel Xeon with a bunch of cores.


References


Stress-testing httperf, siege, apache benchmark, and pronk - HTTP clients for load testing of servers.
Performance Testing with Httperf - tips and tricks on benchmarking.
ApacheBench & HTTPerf - a description of the benchmark process from G-WAN.
Warp is another high-load complaint HTTP server, Haskell.

application


In the application you will find the sources and results of all testing iterations, as well as detailed information on the assembly and installation of HTTP servers.

Also popular now: