youROCK October 21, 2011 at 14:46

Web Project Optimization Strategy Using MySQL

Introduction

In the life of any large web project, especially in PHP, but, in general, this applies to any server-side programming language suitable for web development, there is usually an understanding that “you can’t live on like this,” and that the time has come to optimize the site’s work so that it stops braking (at least on production).

It is interesting that, as a rule, even heavy frameworks (like Symfony or RoR) in “slow” languages, in production environments work fairly reasonably fast, and the main “brakes” are caused by SQL queries and illiterate caching (for example, initialization is enough a complex and large configuration of a project on Symfony takes about 80 ms, and page execution times, in this case, sometimes reach seconds or more).

If you were able to determine that this is your case and your MySQL project, then this article can help you take concrete measures and correct the situation with fixing the result and preventing the occurrence of explicit problems with the DBMS afterwards.

Identification of bottlenecks

To begin with, I would like to say that the real benefits of optimizing work with MySQL will not always be felt, so before that it is advisable to make sure that the problems are really caused by working with MySQL. It’s rather difficult to give general advice here, but, in any case, you should first measure in some way, for example, the total percentage of query execution time in relation to the page execution time (in the production environment) in your DBMS driver. If this percentage is about 50 and higher, then the optimization of working with MySQL is more than justified.

What and how to measure

As a rule, even in the most “bad” (in terms of performance) frameworks and / or CMS there are some tools for tracking DBMS performance problems - usually, at least the number of requests per page is always counted. Just in case, I’ll say that the number of requests per page is more than 100 - this is bad :), and it’s worth starting with this.

Let's assume that there are not so many queries in the DBMS, and it still monstrously slows down, and it is not very clear what exactly has the greatest impact.

Measuring query execution times

Of course, the easiest and moderately effective way to track "slow" requests is to count the execution time of each request and display the execution time of the requests on the screen (based on which to conclude that some requests "slow down"). The problem is that this method is more or less good only for production environments, when the DBMS is well configured and the MySQL server has enough memory so that I / O is minimal, etc., that is, when the query execution time will really depend only on its complexity for execution and the CPU time it takes, and there will be a minimum of third-party factors affecting performance. And if query cache is enabled, then simply repeating the same query will give a completely different execution time (unless functions fromstop list ).

If you use this approach in a dev-environment, then, often, the queries "slow down" not because they are complex, but simply because of an accidental extraneous load or a banal I / O. You can, of course, just immediately after requesting to refresh the page and execute the same requests again, but even this does not guarantee no side effects (for example, some include query cache on the MySQL dev server - if so, then you need it, of course, disconnect immediately).

Since, often, the execution time of requests on the dev database is very “jumpy”, the threshold for slow requests is set to very large values (in Propel with Symfony, this is, by default, 100 ms) or do not pay attention to requests that are slowly executed writing off everything to I / O.

Despite all the shortcomings, of course, the query execution time is a basic criterion for evaluating performance, and you just need to be able to interpret these very times correctly.

Automatic EXPLAIN for queries

You may be tempted to make another query for all SELECT queries with the “EXPLAIN” prefix and consider the product of the “rows” field for all unique query id in EXPLAIN to evaluate the complexity of the query. The idea is good, but often difficult to implement due to difficulties with the execution of the same prepared request, for example. But this is not the biggest problem (the problem with prepared statements can still be solved somehow). The biggest problem is that EXPLAIN in MySQL often blatantly lies .

Here is a simple example (for example, in Table 100,000 records with random values some_field):

Tablet

CREATE TABLE some_table(id INT PRIMARY KEY AUTO_INCREMENT, some_field INT)

We execute a simple query that will scan 100,000 rows and find nothing

SELECT * FROM some_table WHERE some_field = 100500 ORDER BY id LIMIT 100

We look EXPLAIN on MySQL 5.1.49

EXPLAIN SELECT * FROM some_table WHERE some_field = 100500 ORDER BY id LIMIT 100

EXPLAIN says: to the napalnik, I need to look at 100 lines, I’ll use the primer index, to the napalnik, everything wakes me well, I swear by my mom!

We execute a simple query that will scan 100 rows (or a little more) and return them immediately

SELECT * FROM some_table WHERE some_field <> 0 ORDER BY id LIMIT 100

We look EXPLAIN on MySQL 5.0.51

EXPLAIN SELECT * FROM some_table WHERE some_field <> 0 ORDER BY id LIMIT 100

EXPLAIN says: to the napalnik, I need to look at 100,000 lines, I will use the primer index, aspen wakes for a long time, the napalnik! (yes, that’s exactly what he says, I give my word :))

These were very simple examples, so that you can easily understand that the number of lines in EXPLAIN should be treated with caution - after all, EXPLAIN does not execute the request, which means that it cannot know how many actually he will need to read the lines, he only makes an assessment (and often makes a mistake every 10 :))

We count the number of lines read

If you thought that in MySQL for measuring performance the most advanced way is EXPLAIN, then you are mistaken (not so long ago, I thought so too :)).

In fact, executing the following queries:


SET query_cache_type = OFF; -- если query cache включен, выключаем его
FLUSH STATUS; -- скоро поймете :)
SELECT * FROM some_table WHERE some_field = 100500 ORDER BY id LIMIT 100; -- наш запрос
SHOW SESSION STATUS LIKE 'handler_%' -- магия ;)!

It will produce something similar to the following picture:

To better understand what Handler is in MySQL, you can read, for example, the description of the HANDLER commands in MySQL . In short, Handler is a common interface for reading / writing rows (and not only) in MySQL for different types of tables. The name HandlerSocket for the corresponding interface in MySQL should also hint to you what kind of beast it is.

Thus, by executing the first 2 queries at the very beginning of the page (turning off the query cache and resetting statistics), and at the very end of the page - the last query, you can get the total number of rows read / written from MySQL. For a project for which I was engaged in similar optimization, I added alert () on the page in the dev version if more than 10,000 lines were read / written as a result of the page.

Of course, this is not a panacea, but it can seriously help you find and "neutralize" slow pages on the site and take appropriate measures. This method does not give a damn about what the request is (it works even for INSERT INTO ... SELECT ...) and it always gives accurate statistics on the actions taken as a result of the query, even if EXPLAIN cannot say anything good.

The amount of information sent by the server

SESSION STATUS, in fact, contains much more information, for example, traffic information between the client and the SQL server (Bytes_%) - if you have “wide” tables, this can also be relevant (especially when working with ORM, who like to select all columns from tables, even if they are not needed there)

Actually, query optimization

After the bottlenecks have been found, it is necessary to tackle the optimization of these queries and / or revise your view of what a certain block on the site should do.

I hope the information from this topic will be useful to you :). Using this technique, I was able to identify bottlenecks and reduce the execution time of most pages on a Symfony site from ~ 1000 ms to somewhere between 200-300 ms and add a tool to the automatic version to automatically prevent similar problems in the future. All this took me about 3 days for a site with ~ 20 Mb of active “own” (total code about 100 Mb) PHP code :). It is a lot or a little - think for yourself :)

Tags: