Advanced technologies as a way to squeeze the maximum out of the server

    Introduction


    Just beautiful rrdtool =)
    It's funny, but when a programmer develops a product, he rarely thinks about the question of whether 2000 people can press one button at a time. But in vain. It turns out they can. Oddly enough, most of the engines written by such programmers behave very badly under heavy loads. Who would have thought, but just one extra INSERT, not affixed by index, or a curve recursive function can raise load averages by almost an order of magnitude.

    In this article, I will describe how we, the project developers, managed to squeeze Pentium 4 HT / 512Mb RAM from a single server, with a maximum of simultaneously holding 700+ users on the forum and 120,000 on the tracker. Yes, this project is a torrent tracker. I suggest immediately leaving aside conversations about copyrights and rights, I’m not interested, what’s really interesting is HighLoad.

    To start, I will describe the project as it was:

    A regular torrent tracker on the TorrentPier engine (aka phpbb 2.x)
    • Server on FreeBSD 6.0
    • Pentium 4 HT / 512Mb RAM
    • Apache Web Server
    • MySQL base
    • All logic in PHP

    That is, practically LAMP

    Briefly immediately I will immediately write down the steps that we have taken:
    • Installing on the opcode cache server
    • Replacing apache with nginx
    • Caching of some intermediate samples in NOT RDBMS
    • Translation of the key part (read tracker) in C ++
    • Optimization of the FreeBSD network stack, as well as updating it to the latest -STABLE
    • MySQL optimization
    • BB code caching
    • Corresponding code to use SphinxSearch
    • Code profiling and installation of monitoring tools
    • Parsing queries from MySQL slow query log

    Now about each item in more detail

    Installing on the opcode cache server


    He is always needed! Installing php-cache yielded 300% + performance, spending 15 minutes of time.
    Caches are different: eAccelerator , xCache , APC , etc. ... We settled on the latter, due to the good speed and the ability to store user data in it

    Replacing apache with nginx


    Apache is heavy and slow, at first it stood as the main web server, then nginx was installed in front of it , giving static and compressing the responses with gzip. Further, apache was abandoned altogether in favor of the nginx + php-fpm bundle (to be exact at that time it was spawn_fcgi, but now this option is better). The bunch in those days was not the most popular for production, but it worked great for us!

    Caching of some intermediate samples in NOT RDBMS


    RDBMS is evil. It is convenient, but you have to pay for convenience. In this case, the speed. And we need it. So, we cached part of the results of the most popular and non-critical muscle queries to the APC. Immediately foreseeing many questions why not in memcached ... How would you answer ... I’m tired of even hearing this word memcached, memcached, memcached as if it were a panacea for everything. It has not been offered lately except for diarrhea. In our case, the choice fell on APC because it does not use a TCP connection and because of this it works many times faster. For the time being, everything is spinning fine on one server and distributed storage is not so necessary for us.
    You can choose any other key / value storage that does not necessarily store data in RAM .
    But it is very likely that in your case memcached / memcachedb / memcacheQ will be the best option.
    In general, there was an idea to create a multi-level cache layer in which php looked up the value in global variables, then in APC, then in memcached, and only then it gets into the SELECT database. But since we are engaged in the project in our free time from school / work / family, it hasn’t come to this yet.


    Translation of the key part (read tracker) in C ++


    120,000 active peers create quite a few connections to nginx, which is even worse, since each of them pulls php, which pulls a muscle. Don't you think this is too much? It seemed so to us too. One of our developers gathered his strength and rewrote the XBTT code for the TorrentPier frontend. It was worth it, now the client turns to the tracker on port 2710, which holds a plate with peers in his memory, quickly finds it there, does what is needed and gives the answer back to the peer. Once a minute, throws the results into the database. Everything is fine. + 100,000% performance.
    Here are the test results when we set the announcement time - 1 minute
    input (rl0) output
    packets errs bytes packets errs bytes colls drops
    20K 0 2.5M 16K 0 1.5M 0 0

    PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
    10 root 1 171 52 0K 8K RUN 1 538.6H 47.12% idle: cpu1
    6994 root 1 108 0 98140K 96292K CPU0 0 3:57 33.98% xbt_tracker
    11 root 1 171 52 0K 8K RUN 0 595.0H 31.20% idle: cpu0
    35 root 1 -68 -187 0K 8K WAIT 0 17.1H 21.14% irq21: rl0
    12 root 1 -44 -163 0K 8K WAIT 0 482:57 9.96% swi1: net

    [root@****] /usr/ports/devel/google-perftools/> netstat -an | wc -l
    24147

    The issue price is 100 Meters of memory and 30% loading of one processor. In total, it turns out that with the same load, you can keep about 8 Million peers on one machine with a half-hour announcement time

    Optimization of the FreeBSD network stack, as well as updating it to the latest -STABLE


    In the latest versions of FreeBSD 6, the 4BSD scheduler is very well redesigned, and in 7ke there is such a nice thing as ULE, with which the muscle works several times faster on SMP.
    Also, in any high-performance FreeBSD installation, you need to twist sysctl, I recommend doing it according to Sysoev

    MySQL optimization


    A database is something that sooner or later any project rests against, we are no exception.
    At that time, myisam was used for two reasons.
    • it is used by default
    • it has a FULLTEXT index for searching the forum

    So we spent a lot of time spinning the buffers. Tuning-primer.sh helped with this in particular.
    In the future, it is planned to transfer the base to Xtradb . In any case, we are still good - the base fits into memory =)

    BB code caching


    It turns out phpbb "on the fly" converts bbcods to html. Not good. Cached generated html code in a separate database field for each post / signature. As a result, the base became heavier almost 2 times, but the site began to fly.

    Corresponding code to use SphinxSearch


    Once I read the presentation of the flicker, about how they did a search for themselves. Since they had a base on innodb, they created a separate Master-MultipleSlaves farm on myisam to handle searches on it. Well, we are not so rich, we only have one server. Another of our developers, taking his will into a fist, transferred the entire search on the site to ultrafast SphinxSearch . The result exceeded all expectations. The server flies again.
    As an indirect effect, this allowed us to introduce just a super-mega-convenient rss with a built-in search that almost does not load the server.

    Code profiling and installation of monitoring tools


    Strange, but many still do not use it. But in vain. If you do not know where the bottleneck is located, it is impossible to eliminate it. To do this, we stuffed the profiler hook code in php, and installed munin on the server .

    Parsing queries from MySQL slow query log


    Here is a classic! 20% of database queries take 80% of the time. Take a closer look maybe you have it. And after analyzing the logs, FORCE INDEX subscriptions to requests and commenting out several lines in php, loading at rush hour fell by half, and the main page started loading 10 (!!) times faster.
    In general, I highly recommend carrying out such an operation once or twice a year or after the introduction of many small innovations. The mysqlsla tool really helped .

    Instead of an afterword


    Here's how in a few steps we turned a regular LAMP into an integrated system. Now we live on the usual Core2Duo 2 GHz with 3 GB of RAM, now laptops in stores are sold “abruptly”, but we have enough, and loading at rush hour does not rise above 1.5 with 200,000 thousand peers and ~ 500 forum users. I wonder what volume would the park need if we stupidly grow horizontally using LAMP and replication?
    Let's see how everything changes when we no longer miss one server.

    If you read up to this point, then the topic is interesting to you. Well, then welcome to the Server Optimization blog !

    UPD: fixed typos and inaccuracies

    Also popular now: