robert_ayrapetyan December 26, 2013 at 13:02

Benchmark of HTTP servers (C / C ++) in FreeBSD

From the sandbox

We compared the performance of the HTTP server cores built using seven C / C ++ libraries, as well as (for educational purposes), other ready-made solutions in this area (nginx and node.js).

An HTTP server is a complex and interesting mechanism. It is believed that bad programmer, have not written a compiler, I would replace the "compiler" on "the HTTP-server": this parser, and work with the network and asynchronous multi-threading, and so much more ... .

Tests on all possible parameters (returning statics, dynamics, various encryption modules, proxies, etc.) are a task of more than one month of painstaking work, so the task is simplified: we will compare the performance of the cores. The core of the HTTP server (like any network application) is the socket event manager and some primary mechanism for processing them (implemented as a pool of threads, processes, etc.). This also includes the HTTP packet parser and response generator. At first glance, everything should come down to testing the capabilities of one or another system mechanism for processing asynchronous events (select, epoll, etc.), their meta-wrappers (libev, boost.asio, etc.) and the OS kernel, however, a specific implementation as a turnkey solution gives a significant difference in performance.

I implemented my version of the HTTP server onlibev . Of course, support for a small subset of the requirements of the notorious rfc2616 is implemented (it is unlikely that it will be fully implemented by at least one HTTP server), only the necessary minimum to meet the requirements for the participants of this test

Listen to requests on the 8000th port;
Check Method (GET);
Check the path in the request (/ answer);

The answer should contain:

            HTTP / 1.1 200 OK
            Server: bench
            Connection: keep-alive
            Content-Type: text / plain
            Content-Length: 2
            42

On any other method \ path - an answer should be returned with an error code 404 (page not found).

As you can see - no extensions, file accesses on the disk, gateway interfaces, etc. - everything is simplified as much as possible.
_{In cases where the server does not support keep-alive connections (by the way, only cpp-netlib was distinguished by this), testing was carried out respectively. mode.}

Background

Initially, the task was to implement an HTTP server with a load of hundreds of millions of hits per day. It was assumed that there would be a relatively small number of customers generating 90% of requests, and a large number of customers generating the remaining 10%. Each request needs to be sent further to several other servers, to collect responses and return the result to the client. The entire success of the project depended on the speed and quality of the response. Therefore, it was simply not possible to take and use the first available ready-made solution. It was necessary to get answers to the following questions:

Is it worth it to reinvent your bicycle or use existing solutions?
Is node.js suitable for highly loaded projects? ~~If so, then throw out the thickets of C ++ code and rewrite everything in 30 lines to JS.~~

There were less significant questions, for example, does HTTP keep-alive affect performance? (a year later, the answer was voiced here - it affects, and very significantly).

Of course, at first my bike was invented, then node.js appeared (I learned about it two years ago), and then I wanted to find out: how much more effective are the existing solutions than your own, was it wasted time? Actually, this post appeared.

Training

Iron

Processor: CPU: AMD FX (tm) -8120 Eight-Core Processor
Network: localhost (why - see TODO)

Software

OS: FreeBSD 9.1-RELEASE-p7

Tuning
Usually, in the load testing of network applications, it is customary to change the following standard set of settings:

/etc/sysctl.conf

kern.ipc.somaxconn = 65535
net.inet.tcp.blackhole = 2
net.inet.udp.blackhole = 1
net.inet.ip.portrange.randomized = 0
net.inet.ip.portrange.first = 1024
net.inet .ip.portrange.last = 65535
net.inet.icmp.icmplim = 1000

/boot/loader.conf

kern.ipc.semmni = 256
kern.ipc.semmns = 512
kern.ipc.semmnu = 256
kern.ipc.maxsockets = 999999
kern.ipc.nmbclusters = 65535
kern.ipc.somaxconn = 65535
kern.maxfiles = 999999
kern.maxfilesperproc = 999999
kern.maxvnodes = 999999
net.inet.tcp.fast_finwait2_recycle = 1

However, in my testing they did not lead to an increase in performance, and in some cases even led to a significant slowdown, so in the final tests no changes to the settings in the system were made (i.e. all the default settings, the GENERIC kernel).

Members

Library

Name	Version	Events	Keep-alive support	Mechanism
cpp-netlib	0.10.1	Boost.asio	not	multithreaded
hand-made	11/11/30	libev	Yes	multiprocess (one thread per process), asynchronous
libevent	2.0.21	libevent	Yes	single-threaded *, asynchronous
mongoose	5.0	select	Yes	single-threaded, asynchronous, with a list (more)
onion	0.5	libev	Yes	multithreaded
Pion network library	0.5.4	Boost.asio	Yes	multithreaded
POCO C ++ Libraries	1.4.3	select	Yes	multi-threaded (separate thread for incoming connections), with a queue (more)

Ready-made solutions

Name	Version	Events	Keep-alive support	Mechanism
Node.js	10/10/17	libuv	Yes	cluster module (multiprocessing)
nginx	1.4.4	epoll, select, kqueue	Yes	multiprocessing

_{* redone for tests according to the scheme “multiprocess - one process one thread”}

Disqualified

Name	Cause
nxweb	Linux only
g-wan	Linux only (and generally ... )
libmicrohttpd	constant falls under loads
yield	compilation errors
Ehs	compilation errors
libhttpd	synchronous, HTTP / 1.0, does not allow changing headers
libebb	compilation errors, crashes

As a client we used an application from the developers of lighttpd - weighttpd . It was originally planned to use httperf as a more flexible tool, but it constantly crashes. In addition, weighttpd is based on libev, which is much better for FreeBSD than httperf with select. As the main test script (a wrapper over weighttpd with the calculation of resource consumption, etc.), we considered the gwan-ovsky ab.c , redone for FreeBSD, but later it was rewritten from scratch on Python (bench.py in the application).

The client and server were running on the same physical machine.
As variable values were used:

Number of server threads (1, 2, and 3)
Number of concurrently open customer requests (10, 100, 200, 400, 800)

In each configuration, 20-30 iterations were performed, 2 million requests per iteration.

results

In the first version of the article, gross violations were made in the testing methodology, as indicated in the comments by VBart and wentout users . So, in particular, the strict separation of tasks by processor cores was not used, the total number of server \ client threads exceeded the permissible norms. Also, the options affecting the measurement results (AMD Turbo Core) were not disabled, measurement errors were not indicated. The current version of the article uses the approach described here .

For servers running in single-threaded mode, the following results were obtained (maximum medians for server / client stream combinations were taken):

A place	Name	Client flows	Percent time		Inquiries
A place	Name	Client flows	Custom	Syst.	Successful (in sec.)	Unsuccessful (%)
1	nginx	400	10	10	101210	0
2	mongoose	200	12	fifteen	53255	0
3	libevent	200	16	33	39882	0
4	hand-made	100	20	32	38550	0
5	onion	10	22	33	29230	0
6	POCO	10	25	fifty	20943	0
7	pion	10	24	83	16526	0
8	node.js	10	23	173	9374	0
9	cpp-netlib	10	100	183	5362	0

Scalability:

In theory, if there were more cores, we would observe a linear increase in performance. Unfortunately, it is not possible to verify the theory - there are not enough nuclei.

frankly speaking, nginx surprised me - after all, in essence it is a ready-made, multifunctional, modular solution, and the results are an order of magnitude superior to highly specialized libraries. Respect.

mongoose is still damp, version 5.0 is not run-in, and the branch is in an active development stage.

cpp-netlib showed the worst result. Not only was it the only one that did not support HTTP keep-alive connections, it also crashed somewhere in the bowels of the boost, it was problematic to perform all iterations in a row. Definitely, the decision is crude, the documentation is outdated. Legal last place.

node.js already scolded here, I will not be so categorical, but the V8 is still sawing and sawing. What is this high-load solution that even without payload consumes resources so eagerly and gives 10-20% of the productivity of top test participants?

HTTP keep-alive on / off: if in a post the difference reached x2 times, then in my tests the difference was up to x10.

Accuracy according to ministat: No difference proven at 95.0% confidence.

Todo

benchmark in the "client and server on different machines" mode. You need to be careful - everything can run into network glands, and not only models of network cards, but switches, routers, etc. - the entire infrastructure between real machines. For starters, you can try a direct connection;
testing client HTTP API (organize as a server and proxy). The problem is that not all libraries provide APIs for implementing an HTTP client. On the other hand, some popular libraries (libcurl, for example) provide an exclusively client-side set of APIs;
use of other HTTP clients. httperf was not used for the above reasons, ab - for many reviews is outdated and does not hold real loads. Many have recommended. Here are a couple of dozen solutions, some of which should be compared;
A similar benchmark in a Linux environment. This should be an interesting topic (at least a new wave for holivar discussions);
run tests on the top Intel Xeon with a bunch of cores.

References

Stress-testing httperf, siege, apache benchmark, and pronk - HTTP clients for load testing of servers.
Performance Testing with Httperf - tips and tricks on benchmarking.
ApacheBench & HTTPerf - a description of the benchmark process from G-WAN.
Warp is another high-load complaint HTTP server, Haskell.

application

In the application you will find the sources and results of all testing iterations, as well as detailed information on the assembly and installation of HTTP servers.

Tags: