Wendor October 18, 2010 at 08:05

The history of development and optimizations of one highly loaded resource

From the sandbox

Introduction

It all started with the fact that I became a system administrator at a provincial Internet service provider. In addition to administering various kinds of resources, I got one young but rapidly developing resource in my care. The resource was a classic LAMP project. A site where content generators are regular users.
* By the way, at that time I did not understand anything in * nix systems, although all the servers that I got were on it, I knew all about it quickly enough.

As is usually the case with resources gaining popularity, the glands on which everything is spinning cease to cope. The resource was located on an old dual-processor server, on which almost all services for users were running. At that time, the authorities did not perceive the resource as something worth investing, therefore, to my regret (and later - fortunately), they did not allocate money for a new piece of iron to me.

nginx + php-fpm

Google came to the rescue. As it turned out, for projects with a high load, people use php in fast-cgi mode. It turned out that there are several schemes for implementing this mode, but in the end, preference was given to Russian developments, a bunch of nginx (Igor Sysoev) and php-fpm(Andrei Nigmatulin). The performance gain compared to Apache and its mod_php is achieved due to the fact that php-fpm creates n php processes that later hang on the system and process scripts transferred from the web server. With this scheme, time and system resources are saved on calling the php interpreter. Don’t ask why I didn’t choose any spawn-fcgi with lighttpd, I don’t want to start holivars on this topic, and it doesn’t matter. The system performance has increased, the load has subsided, for a while I breathed a sigh of relief.

eAccelerator

The next step towards high performance is installing a php accelerator. The essence of his work is to cache the binary code of the script. Indeed, why waste precious processor time translating a script into binary code with every call? There could be up to 100 such calls to the same script per second, so eAccelerator came in handy. After installing it, the system performance increased again, the load decently decayed, the page generation time decreased sharply, and users again began to be content with an accelerated resource.

MySQL index's

I must say that before working in the company I was pretty dabbling with php, so I knew the code of the resource very well and understood what and why it was needed, but I understood that as in any code, there are bottlenecks and I decided to start searching for them. After adding timers to the project counting the php runtime and sql query runtime, the bottleneck was found right away. The resource was based on one open-source CMS and judging by the code, some of the developers had no idea about indexes in MySQL. Well, the long identification and alteration of problematic queries began, as well as the addition of indexes to where they are really needed. EXPLAINbecame my companion for the next few days. As a result, the execution time of sql queries in places was reduced to 10-20 times and happy days again began for dear users.

Memcached

By the time the load was back on the shelf, the authorities had already allocated a full-fledged server for the resource. But I have already played sports interest. When the page generation time drops from 2 seconds to 0.5 seconds it is very, very inspiring, but I wanted more. I wanted to get rid of heavy sql queries altogether, leaving only a critical minimum. Surfing the open spaces, I came across a report by Andrei Smirnov: “Web, caching and memcached” (presentation at HighLoad ++ 2008). Indeed, this is what you need! Memcached- The simplest and at the same time high-performance caching server, developed at one time for livejournal.com, fit perfectly into my scheme. Unfortunately, using memcache to the fullest did not seem possible to me, because my work was not limited to this web resource, but much was done nevertheless. I used memcached to cache the result of sql queries, or to store ready-made rendered blocks. On many pages of the site, the time of their generation was reduced to terribly small numbers - 0.009s! This was the biggest discovery and achievement of all time.

Sysctl and unix-sockets

Another not unimportant moment in the struggle for the quality of the resource is tuning sysctl . At high loads, the default configuration leaves much to be desired. A couple of weeks were spent finding the optimal parameters of the network subsystem. Also, if possible, php-fpm, memcached and MySQL were hung on unix-sockets . As a result, at peak loads, the server delivers content as quickly as without loads.

Sphinx

By the time I mastered caching, I was wondering, is it possible to speed up anything else? Naturally! Search was the weakest site. Guess why? Right! There was used a terrible evil - LIKE in sql query. Then the Sphinx search engine came to the rescue ! Unfortunately, in my case, it turned out to use Sphinx only for sagests (tooltips during the search), because updating the main sql table was very frequent, and the relevance of the data was very important, so I had to abandon it. However, if I had more time for a detailed analysis of this point, perhaps I would have dealt with this problem and Sphinx would be another key point in the development of the service.

MySQL to tmpfs

A lot of time has passed since all my tunings and optimizations. The resource became significant for the company, money was allocated for its support and 2 servers were already serving it. The first acted as a web server and processed php, the second was used under MySQL. But the development did not stand still and now the resource reaches 600 calls per second, both parts of the resource cease to cope. The plugs are both at the php level and at the MySQL level. After receiving the third server, the question arose about its scaling, but I could not think of an ideal option. And on the pages of the Habr saw a topic about MySQL in tmpfs. And I thought, why not. Spent some preparatory work with the base. As much as possible reduced a DB by removal of unimportant but "gluttonous" functions. Removed some logging functions in the database. Ultimately, the weight of the database was reduced from 11 to 2.5 GB. And so, it was decided to use 2 servers under php, and on the 3rd run MySQL with doping. The standard mysqlhotcopy utility handles backup very well (1.5 seconds and you 're done!). I did so. The load on MySQL decreased by 4 times, productivity also increased.

Conclusion

Why did I decide to tell all this? Perhaps this article will be interesting to a certain circle of people who are faced with similar problems. If I had found something similar in my time, it would have helped me a lot. Another resource will undergo major changes. Everything will be rewritten, and nothing will remain of the old except content and users. For me, the sunset of all my developments is seen, but I have gained enormous experience thanks to them, and experience is invaluable. And the article will become for me a memory of what I once did.

In the process of work, my colleague ( CentAlt ) helped me a lot , who supported me in every possible way in all endeavors, and for this I am very grateful. By the way, maybe some of you know him. He holds one very useful repository for Centos.where you can find the latest versions of nginx, php-fpm, unbound, clamav, postfix, dovecot, etc.

The article does not claim to be superior or popular. This is not an instruction on how to or should not be done. This is the story of one highly loaded project, which I happened to develop and administer.

Thank you for reading to the end!

Tags: