nazarpc March 3, 2015 at 10:33

Nuclear reactor to every site

Everyone has heard that PHP was created to die . So, this is not entirely true. If you want, PHP may not die, work asynchronously, and even support honest multithreading. But not all at once, this time we’ll talk about how to make him live long and an atomic reactor will help us with this!

A nuclear reactor is a ReactPHP project , the description says "Nuclear Reactor written in PHP." This article prompted me to get to know him (the picture above is from there). I re-read it several times during the year, but I couldn’t get to the implementation in practice, although the performance growth was more than an order of magnitude in the long run very pleased.

The initial state

The experimental system is CleverStyle CMS, the APCu caching engine, a version in development, that is, all possible components are installed, in the tests the Static pages module page opens.
The test piece is a working laptop with Core i7 4900MQ (4 cores, 8 threads), OS Ubuntu 15.04 x64, the disk subsystem consists of two SATA3 SSDs in RAID0 (soft, btrfs, which is not the best option for the database yet, turned out to be a rather bottleneck tests, but there is something), before each test, sudo sync is launched, with each request 2-4 queries are made in the database (creating a visitor session, they are not cached at the database level), Nginx has 16 workers.
The conditions are not laboratory, but you need to work with something)
We will test the performance with a simple Apache Benchmark.

First, PHP-FPM (PHP 5.5, 16 workers, statically):

Hidden text

nazar-pc @ nazar-pc ~> ab -n5000 -c128 cscms.org : 8080 / uk
This is ApacheBench, Version 2.3 <$ Revision: 1604373 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, www.zeustech.net
Licensed to The Apache Software Foundation, www.apache.org

Benchmarking cscms.org (be patient)
Completed 500 requests
Completed 1000 requests
Completed 1500 requests
Completed 2000 requests
Completed 2500 requests
Completed 3000 requests
Completed 3500 requests
Completed 4000 requests
Completed 4500 requests
Completed 5000 requests
Finished 5000 requests

Server Software: nginx / 1.6.2
Server Hostname: cscms.org
Server Port: 8080

Document Path: / uk
Document Length: 99320 bytes

Concurrency Level: 128
Time taken for tests: 22.280 seconds
Complete requests: 5000
Failed requests: 4239
(Connect: 0, Receive: 0, Length: 4239, Exceptions: 0)
Total transferred: 498328949 bytes
HTML transferred: 496603949 bytes
Requests per second: 224.41 [# / sec] (mean)
Time per request: 570.373 [ms] (mean)
Time per request: 4.456 [ms] (mean , across all concurrent requests)
Transfer rate: 21842.25 [Kbytes / sec] received

Connection Times (ms)
min mean [± sd] median max
Connect: 0 0 0.5 0 3
Processing: 26 563 101.6 541 880
Waiting: 24 559 101.3 537 872
Total: 30 564 101.4 541 881

Percentage of the requests served within a certain time (ms)
50% 541
66% 559
75% 572
80% 584
90% 759
95% 795
98% 817
99% 829
100% 881 (longest request)

Competitiveness 128, because with 256 PHP-FPM just drops.

Now HHVM, first we’ll warm up HHVM with 50,000 requests ( why ), then run the test:

Hidden text

nazar-pc @ nazar-pc ~> ab -n5000 -c256 cscms.org : 8000 / uk
This is ApacheBench, Version 2.3 <$ Revision: 1604373 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, www.zeustech.net
Licensed to The Apache Software Foundation, www.apache.org

Benchmarking cscms.org (be patient)
Completed 500 requests
Completed 1000 requests
Completed 1500 requests
Completed 2000 requests
Completed 2500 requests
Completed 3000 requests
Completed 3500 requests
Completed 4000 requests
Completed 4500 requests
Completed 5000 requests
Finished 5000 requests

Server Software: nginx / 1.6.2
Server Hostname: cscms.org
Server Port: 8000

Document Path: / uk
Document Length: 99309 bytes

Concurrency Level: 256
Time taken for tests: 20.418 seconds
Complete requests: 5000
Failed requests: 962
(Connect: 0, Receive: 0, Length: 962, Exceptions: 0)
Total transferred: 498398875 bytes
HTML transferred: 496543875 bytes
Requests per second: 244.88 [# / sec] (mean)
Time per request: 1045.408 [ms] (mean)
Time per request: 4.084 [ms] (mean , across all concurrent requests)
Transfer rate: 23837.54 [Kbytes / sec] received

Connection Times (ms)
min mean [± sd] median max
Connect: 0 0 1.5 0 8
Processing: 505 1019 102.6 1040 1582
Waiting: 505 1017 102.9 1039 1579
Total: 513 1019 102.5 1040 1586

Percentage of the requests served within a certain time (ms)
50% 1040
66% 1068
75% 1080
80% 1087
90% 1108
95% 1126
98% 1179
99% 1397
100% 1586 (longest request)

We received 245 requests per second, and we will work with this.

First steps

I want the code not to depend on whether it is run from under an HTTP server written in PHP, or in a more familiar mode.
To do this, headers_list () / header_remove () and http_response_code () were utilized, superglobal $ _GET, $ _POST, $ _REQUEST, $ _COOKIE, $ _SERVER were manually filled.
System classes were destroyed after each request and created with a new one.
In general, it worked, but there were nuances:

In the case of using asynchronous operations where more than one request will be executed simultaneously, everything will be covered with a copper basin
Creating all the system objects still created significant overhead, although it worked faster than a complete restart of the script
It did not start from under PHP-CLI, to send headers you need PHP-CGI, which has memory leak (for some unknown reason) during a long-running process
If someone decided to call exit () / die () - everything dies

Optimization, asynchronous support

Firstly, system objects were divided into two groups - the first, requests that depend on the user and a specific request, the second - completely independent.
Independent objects ceased to collapse after each request, which gave a significant increase in speed.
An object that receives a request from ReactPHP and generates a response received an additional __request_id field. Upon receipt of a system object that depends on a specific request using debug_backtrace (), this __request_id is obtained, which allows you to separate these objects for each individual request, even when asynchronous.
Separately, system functions that work with the global state were allocated separately; for the HTTP server, their modified versions were connected, which take into account __request_id. The functions _header () were added instead of header () (for headers to work under PHP-CLI), _http_response_code () instead of http_response_code (), the existing _getcookie () and _setcookie () were modified, the latter manually creates headers under the hood for changing cookies and sends them to _header ().
Superglobal variables are replaced with array-like objects, and when accessing the elements of such a strange array, we get data that matches a particular query - here the compatibility with regular code is high, the main thing is not to overwrite the superglobal variables, and keep in mind that there may not be an entirely array (for example, if used with array_merge ()).
As another compromise solution, \ ExitException was added to the system, which replaces the calls to exit () / die () (including modifying third-party libraries if necessary, except in situations where the entire script needs to be completed), this allows you to intercept the output at the very top , and to avoid the completion of the script.

We test the result on a pool of 16 running Http servers (HHVM interpreter), Nginx balances requests (warming up 50,000 requests per pool):

Hidden text

nazar-pc @ nazar-pc ~> ab -n5000 -c256 cscms.org : 9990 / uk
This is ApacheBench, Version 2.3 <$ Revision: 1604373 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, www.zeustech.net
Licensed to The Apache Software Foundation, www.apache.org

Benchmarking cscms.org (be patient)
Completed 500 requests
Completed 1000 requests
Completed 1500 requests
Completed 2000 requests
Completed 2500 requests
Completed 3000 requests
Completed 3500 requests
Completed 4000 requests
Completed 4500 requests
Completed 5000 requests
Finished 5000 requests

Server Software: nginx / 1.6.2
Server Hostname: cscms.org
Server Port: 9990

Document Path: / uk
Document Length: 99323 bytes

Concurrency Level: 256
Time taken for tests: 16.092 seconds
Complete requests: 5000
Failed requests: 1646
(Connect: 0, Receive: 0, Length: 1646, Exceptions: 0)
Total transferred: 498418546 bytes
HTML transferred: 496643546 bytes
Requests per second: 310.71 [# / sec] (mean)
Time per request: 823.928 [ms] (mean)
Time per request: 3.218 [ms] (mean , across all concurrent requests)
Transfer rate: 30246.49 [Kbytes / sec] received

Connection Times (ms)
min mean [± sd] median max
Connect: 0 0 0.9 0 6
Processing: 100 804 308.3 750 2287
Waiting: 79 804 308.2 750 2285
Total: 106 804 308.1 750 2287

Percentage of the requests served within a certain time (ms)
50% 750
66% 841
75% 942
80% 990
90% 1180
95% 1381
98% 1720
99% 1935
100% 2287 (longest request)

Already not bad, 310 requests per second is 1.26 times more than HHVM in normal mode.

We will optimize further

Since the code was not originally written asynchronously, one request will not pop up before the other, so you can add a normal, non-asynchronous mode, and assume that the requests will be executed strictly in turn.
In this case, we can get by with ordinary arrays in superglobal variables, we don’t need to do debug_backtrace () when creating system objects, and some system objects can be partially reinitialized instead of completely re-created and also saved.
Here is what it gives on a pool of 16 running Http servers (HHVM), Nginx balances requests (warming up 50,000 requests per pool):

Hidden text

nazar-pc @ nazar-pc ~> ab -n5000 -c256 cscms.org : 9990 / uk
This is ApacheBench, Version 2.3 <$ Revision: 1604373 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, www.zeustech.net
Licensed to The Apache Software Foundation, www.apache.org

Benchmarking cscms.org (be patient)
Completed 500 requests
Completed 1000 requests
Completed 1500 requests
Completed 2000 requests
Completed 2500 requests
Completed 3000 requests
Completed 3500 requests
Completed 4000 requests
Completed 4500 requests
Completed 5000 requests
Finished 5000 requests

Server Software: nginx / 1.6.2
Server Hostname: cscms.org
Server Port: 9990

Document Path: / uk
Document Length: 8497 bytes

Concurrency Level: 256
Time taken for tests: 5.716 seconds
Complete requests: 5000
Failed requests: 4983
(Connect: 0, Receive: 0, Length: 4983, Exceptions: 0)
Total transferred: 44046822 bytes
HTML transferred: 42381822 bytes
Requests per second: 874.69 [# / sec] (mean)
Time per request: 292.676 [ms] (mean)
Time per request: 1.143 [ms] (mean , across all concurrent requests)
Transfer rate: 7524.85 [Kbytes / sec] received

Connection Times (ms)
min mean [± sd] median max
Connect: 0 0 0.9 0 7
Processing: 6 284 215.9 241 976
Waiting: 6 284 215.9 241 976
Total: 6 284 215.8 241 976

Percentage of the requests served within a certain time (ms)
50% 241
66% 337
75% 409
80% 442
90% 623
95% 728
98% 829
99% 869
100% 976 (longest request)

875 requests per second, which is 3.57 times more than the original version with HHVM, which is good news (sometimes it’s a couple of hundred more requests per second, it’s a couple of hundred less, the weather on the desktop is different, but at the time of writing, the results such).

There are also prospects for an even greater increase in productivity (for example, support for keep-alive and other things in ReactPHP is expected), but much already depends on the project where it is used.

Limitations

Since we maintain maximum compatibility with any existing code - in asynchronous mode at different time zones users need to use them explicitly, otherwise date () may return an unexpected result.
Also, downloading files is not yet supported, but there are already 2 pull requests for multipart support, in the near future they can be included in react / http, then it will work here.

Underwater rocks

The main pitfall in this mode is a memory leak. When, after performing 1000 requests, there was one memory consumption, and after 5000, a couple of megabytes more.
Tips for catching leaks:

Trim the amount of executable code to a minimum, run 5,000 queries, logging the amount of memory after each execution, compare the consumption
Add some executable code, repeat
Continue to check the entire code, the number of requests can be lowered gradually to 2000 (in order not to wait for a long time), but in case of doubt, throwing a few more thousand requests will not be superfluous
Several requests may be required to stabilize memory consumption, first up to 100 requests, sometimes when starting a full system there were up to 800 requests to stabilize memory consumption, after which the amount of memory consumed stops growing.
Since the situation is not very mainstream, it may happen that the memory does not flow in your code, but in a third-party library, or in general the PHP extension (PHP-CGI as an example) - here you can wish good luck and do not forget about the supervisor over the server :)

The second is the connection to the database - it can come off, be prepared to lift it when it falls. This is absolutely not true with the popular approach, it can immediately create problems.
Third, catch errors and do not use exit () / die () unless you mean exactly that.
Fourth - you need to somehow separate the global state of different requests if you are going to work with asynchronous code, if there is no asynchronous code - the global state is simple enough to fake, the main thing is not to use query-dependent constants, static variables in functions and similar things, unless want to suddenly make a guest admin :)

Conclusion

With this approach, a significant increase in productivity can be achieved either without changes or with minimal (automatic search and replacement), and with Request / Response frameworks this is even easier to do.
The increase in speed depends on the interpreter and what the code does - with heavy calculations, HHVM compiles the heavy sections into machine code, with requests to external APIs you can use less efficient asynchronous mode, but load data from the external API asynchronously (if the request to the API takes hundreds of milliseconds this will give a significant increase in the overall speed of processing requests).
If you want to try it - in CleverStyle CMS this and much more is available out of the box and just works.

Source code

There are not many sources , if you wish, you can modify and use them in many other systems.
The class in Request.php receives a request from ReactPHP and sends a response, functions.php contains functions for working with a global context (including several specific to CleverStyle CMS), Superglobals_wrapper.php contains a class that is used for array-like superglobal objects, Singleton .php - a modified version of the trait, which is used instead of the system one to create system objects (it also determines which objects are common to all requests and which are not).

Tags:

Nuclear reactor to every site

The initial state

First steps

Optimization, asynchronous support

We will optimize further

Limitations

Underwater rocks

Conclusion

Source code

Also popular now: