maghamed September 16, 2008 at 10:02

MySQL Performance real life Tips and Tricks. Part 3-rd.

I decided to continue the series of notes on this topic. In this article, I wanted to give a special place to profiling MySQL queries. Describe the tools that MySQL provides for profiling, and what needs to be done to identify the bottlenecks of the query.

Also, after publishing the first two articles, I received a couple of reviews and questions related to database design / indexing / query writing. I tried to answer many questions. I will share with some of them in this article.

Request Profiling

So, I'll start with the standard query profiling tools using MySQL

EXPLAIN
SHOW STATUS
PROFILING

EXPLAIN

Using the EXPLAIN command, MySQL shows an approximate execution plan of the query (The description of all returned fields with the EXPLAIN command can be found at dev.mysql.com/doc/refman/5.0/en/using-explain.html ). In fact, in this definition the keyword is approximatebecause EXPLAIN may be wrong :-) The reason for these errors is that the generation of the execution plan is part of the query execution, sometimes the execution plan is built dynamically depending on the data. With the help of the EXPLAIN command, MySQL tries to simulate the execution of the query, but does not “touch” the data, and therefore does not have access to this dynamic component. MySQL makes this assessment mainly relying on index statistics. Which, in turn, should always be kept up-to-date, and depending on the intensity of queries to the database for adding / changing / deleting data, carry out a certain periodicity according to cron (say, every night) to execute queries - ANALYZE TABLE (rebuilds the index tree, not allowing him to make a list, this can happen if we, say, insert the ordered data, then the efficiency of the write search operation is not O (logn), but O (n). I also note that you should not always run the ANALYZE TABLE command on a production server, as at the time of its execution MySQL locks the tables MyISAM (read lock), InnoDB (write lock)), OPTIMIZE TABLE (defragmentation of the data file. Useful for large deletion of records from the table).
EXPLAIN also does not provide all the information for evaluating the speed of a query. For example, if a temporary table is used during a query, MySQL will not show whether it will be in-memory or in-disk temporary tables. Also, MySQL in the exploit will not show the price of write access operations or perform functions used in the query.

Also, in this paragraph I wanted to note the main thing - that the query performance depends on the number of records that were checked (examined) by MySQL, and not on the number of records returned by the query.

SHOW STATUS

MySQL has long been providing monitoring of system variables by means of the command - SHOW STATUS. Previously (before version 4.1.), This command showed the status of global variables, but since version 4.1. Now you can show the state of session variables - SHOW SESSION STATUS. (The manual for this command is dev.mysql.com/doc/refman/5.0/en/show-status.html )
I can only say that this command, unlike EXPLAIN, which is based on heuristic evaluations, shows directly WHAT happened after the request was executed, i.e. the number of records that MySQL physically accessed (moreover, using this command you can find out how many of them were received from memory, and how many by accessing the disk) I also want to note that this number is not an estimated result, but a real number (actually MySQL incrementing a variable all the time when it accesses each next line). In general, I will say that SHOW STATUS returns a huge amount of statistics, I naturally will not describe all of it. I will show the main points that should be noticed during her (statistics) analysis.

In order to get statistics regarding the query of interest to us, you first need to run the command:

FLUSH STATUS - this command will reset (zero) all session variables, global variables will not be affected.

After that we execute the query

SELECT bla-bla-bla FROM test_table as tt WHERE tt.bla = 'test' LIMIT 10

mysql> SHOW STATUS LIKE 'Key_read%';
+ -------------------- + ---------- +
| Variable_name | Value |
+ -------------------- + ---------- +
| Key_read_requests | 96882 |
| Key_reads | 36200 |
+ -------------------- + ---------- +

What can such information give? And what decisions to make based on this information?

Now let's try to analyze the obtained values and make a decision on optimization, based on these statistics.

Key_read_requests - The number of requests to read index blocks from the cache.
Key_reads - The number of physical reads of index blocks from disk. If the number of Key_reads is large, then your key_buffer_size is probably small enough and needs to be increased. The cache miss ratio can be calculated as Key_reads / Key_read_requests.

In this case, we see that more than a third of the index blocks are read from the disk, and not removed from the cache => accordingly, an increase in this parameter (key_buffer_size) will be an adequate measure of optimization.
But the following should also be taken into account: key_buffer_size- One of the most important configuration parameters, especially if you use MyISAM tables. Then the value of this parameter should be approximately 25-30% of the amount of free RAM. But you should also pay attention that this parameter is not unreasonably large (for example, if the total volume of .MYI index files is 1GB and key_buffer_size = 4GB, in this case you are wasting your memory). Also note that the maximum value for this parameter is 4GB (The maximum allowable setting for key_buffer_size is 4GB on 32-bit platforms. As of MySQL 5.0.52, values larger than 4GB are allowed for 64-bit platforms (except 64- bit Windows, for which large values are truncated to 4GB with a warning). The effective maximum size might be less, depending on your available physical RAM and per-process RAM limits imposed by your operating system or hardware platform. The value of this variable indicates the amount of memory requested. Internally, the server allocates as much memory as possible up to this amount, but the actual allocation might be less.dev.mysql.com/doc/refman/5.0/en/server-system-variables.html ). But even if you do not use MyISAM tables, the value of key_buffer_size should still be set to 32-64MB, because it will be used by indexes for temporary tables (MyISAM).

So, what other useful statistics can be obtained for more complex queries, for example, this:

FLUSH STATUS;

SELECT
  SQL_NO_CACHE fe.username, COUNT (*) as `count`
FROM
  ` tx_images` as ti
INNER JOIN
  `fe_users` as fe
  ON ti.cruser_id = fe.uid
GROUP BY
  fe.uid
ORDER BY
  ` count` desc;
* This source code was highlighted with Source Code Highlighter .

| Variable_name | Value |
+ ------------------------ + ------- +
| Select_full_join | 0 |
| Select_full_range_join | 0 |
| Select_range | 0 |
| Select_range_check | 0 |
| Select_scan | 2 |
+ ------------------------ + ------- +

Select_scan - The number of joins that did a full scan of the first table.

The last field shows that MySQL was doing a FULL TABLE SCAN, in fact this information is confirmed by EXPLAIN, the Extra field, which contains Using temporary; Using filesort.
If we have several tables in our query that are glued together during the query, then the other values in this list may not be zero.

mysql> SHOW SESSION STATUS LIKE 'Sort%';
+ ------------------- + ------- +
| Variable_name | Value |
+ ------------------- + ------- +
| Sort_merge_passes | 0 |
| Sort_range | 0 |
| Sort_rows | 598 |
| Sort_scan | 1 |
+ ------------------- + ------- +
4 rows in set (0.00 sec)

Again, I see the description of these fields from the manual.

Sort_rows - The number of sorted rows.
Sort_scan - The number of sorts that were done by scanning the table.

It is also useful to know whether temporary tables were created that were used during query execution on disk or in memory.

mysql> SHOW SESSION STATUS LIKE 'Created%';
+ ------------------------- + ------- +
| Variable_name | Value |
+ ------------------------- + ------- +
| Created_tmp_disk_tables | 0 |
| Created_tmp_files | 0 |
| Created_tmp_tables | 3 |
+ ------------------------- + ------- +
3 rows in set (0.00 sec)

Here MySQL shows that inmory was used tables, only for some reason shows that 3 of them were created :-) Actually, only one is created for this query.

Here I will do a little digression.
After publishing the first two articles, I received a couple of letters with questions and suggestions. So, as I understand it, a common mistake is a flaw in terms of architectural decisions, which pulls a very low performance in query queries.
For example, if you have a table containing some articles / news, etc. and has approximately the following structure.

CREATE TABLE `tx_articles` (
` uid` int (11) NOT NULL auto_increment,
`pid` int (11) NOT NULL default '0',
` tstamp` int (11) NOT NULL default '0',
`crdate` int ( 11) NOT NULL default '0',
`cruser_id` int (11) NOT NULL default '0',
` deleted` tinyint (4) NOT NULL default '0',
`hidden` tinyint (4) NOT NULL default '0' ,
`headline` varchar (70) NOT NULL default '',
` bodytext` text NOT NULL,
`type` int (11) NOT NULL default '0',
` citycat_id` int (11) NOT NULL default '0',
` editable` int (11) NOT NULL default '0',
`confirm` int (11) NOT NULL default '0',
` confirm_code` varchar (64) NOT NULL default '',
`editorspick` int (11) NOT NULL default' 0 ',
`newspaper_id` int (11) NOT NULL default '0',
PRIMARY KEY (` uid`),
KEY `parent` (` pid`),
KEY `citycat_id` (` citycat_id`, `tstamp`) KEY`
newspaper_id` ( `newspaper_id`)
KEY` tstamp` (`tstamp`)
) ENGINE = MyISAM DEFAULT CHARSET = latin1
* This source code was highlighted with Source Code Highlighter .

It is also assumed that the lion's share of queries to select will be addressed to this particular table. Moreover, as a rule, such a table can contain a sufficiently large number of columns, and some of them, as in the above case, are of the TEXT or BLOB data type (`bodytext` text NOT NULL), as well as variable fields (VARCHAR). All this complicates the work of MySQL, especially in sorting / grouping operations. Because Imagine that each record has an average length of 1Kb and when MySQL will sort / group such records, the server will simply "suffocate" every time such large amounts of data are tossed.
And this is not all the disadvantages of this approach. Particular attention should be paid to this.- MySQL for the temporary table uses the HEAP engine, which only works with fixed-length fields, i.e. he makes varchar char (the maximum length specified by the varchar declaration). This is done, as you know, for a quick search of records. But he does not know how to work with the TEXT and BLOB fields, so he converts the table type to MyISAM and the data is on the disk, and this is very slow.
Therefore, such an architecture should be avoided on large bases with a large number of hits and requiring a fast response time.

What is proposed to do? In some cases, the use of covering indexes and self-join can be useful, this example was described in my first article ( http://habrahabr.ru/blogs/mysql/38907/ ), so I won’t repeat ... I won’t repeat ...: - )

More correct, from the point of view of the architectural approach, in this case is the allocation of variable-length fields, which, as a rule, do not participate in the WHERE conditions (TEXT, BLOB, depending on the task condition, this may be VARCHAR) in a separate table. Accordingly, the basic conditions for the selection, as a rule, for index fields with a fixed length, we fulfill on the same table, also execute LIMIT on it and only after that we glue, add additional fields, in my example it will be “bodytext” from another table. Moreover, in this case, such gluing will be performed according to PRIMARY KEY.

PROFILING

Since version 5.0.37, MySQL includes the ability to profile queries. This utility writes statistics of query execution to the service database information_schema.

In order for MySQL to start recording statistics on request, set the value of the profiling variable to 1

mysql> set profiling = 1;
Query OK, 0 rows affected (0.00 sec)

* This source code was highlighted with Source Code Highlighter .

After that, we execute the SQL query (s) of interest to us.

Next, execute the following request

mysql> show profiles;
+ ---------- + ------------ + ------------------------- ---------------------- +
| Query_ID | Duration | Query |
+ ---------- + ------------ + ------------------------- ---------------------- +
| 0 | 0.00005100 | set profiling = 1 |
| 1 | 0.80246730 | SELECT SQL_NO_CACHE fe.username, COUNT (*) as `count` FROM` tx_images` as ti INNER JOIN `fe_users` as fe ON ti.cruser_id = fe.uid GROUP BY fe.uid ORDER BY` count` desc |
+ ---------- + ------------ + ------------------------- ---------------------- +
2 rows in set (0.00 sec)
* This source code was highlighted with Source Code Highlighter .

As I said earlier, these statistics are recorded in the database - information_schema, the table - profiling, so we can get statistics on the execution of the request by making a request to this table.

mysql> select sum (duration) from information_schema.profiling where query_id = 1;
+ --------------- +
| sum (duration) |
+ --------------- +
| 0.80246730 |
+ --------------- +
1 row in set (0.00 sec)
* This source code was highlighted with Source Code Highlighter .

Next, you can see the statistics of all stages of the query, this is done using the

mysql> show profile for query 1 command ;
+ -------------------- + ------------ +
| Status | Duration |
+ -------------------- + ------------ +
| (initialization) | 0.00007300 |
| Opening tables | 0.00005100 |
| System lock | 0.00000600 |
| Table lock | 0.00002000 |
| init | 0.00002200 |
| optimizing | 0.00003400 |
| statistics | 0.00010600 |
| preparing | 0.00014800 |
| executing | 0.50000700 |
| Sending data | 0.30226800 |
| end | 0.00000700 |
| query end | 0.00000500 |
| freeing items | 0.00001300 |
| closing tables | 0.00000700 |
| logging slow query | 0.00000400 |
+ -------------------- + ------------ +
15 rows in set (0.00 sec)

You can also see statistics with the help of profiling not only by SELECT queries. You can view statistics even for queries that change the structure of ALTER TABLE tables and modify / delete data. Accordingly, the execution stages of these requests will differ.

Also, this type of profiling allows you to monitor the processor load during each stage of the query, SWAP, etc.

Slow query log

A very useful tool for finding bottlenecks and slow database queries is to log slow queries, namely the --log-slow-queries directive when starting MySQL. Prior to MySQL version 5.1.21 , the minimum value for the long_query_time parameter is 1, and the default value is 10. This value must be an integer type. Those. if the query execution time exceeds the value of this parameter in seconds, then it is logged. It should also be noted that if the query falls into the slow query log, then this does not say that this is a bad query, maybe he waited a long time until the lock is released. Also useful is logging queries that do not use indexes --log-queries-not-using-indexes .
This is very useful information, but if we have a large project and a loaded database, then we need tools to monitor and visualize such statistics. I personally use MONyog

editorial answers :-)

Terer will answer a couple of questions that I didn’t have time to answer:
During the discussion of the second article, I in the comments ( http://habrahabr.ru/blogs/mysql/39260/#comment_941339 ) argued with habrachelovek juks about the appropriateness of using force index in queries.

Consider, for example, such a table structure

CREATE TABLE `some_table`
(
` id` INT (10) UNSIGNED NOT NULL AUTO_INCREMENT,
`field1` INT (10) NOT NULL,
` field2` VARCHAR (50) NOT NULL default '',
`field3` TIMESTAMP,
` field4` VARCHAR (255) default '',
PRIMARY KEY (`id`),
KEY fields1_2 (field1, field2 (8)),
KEY fields2_3 (field1, field3)
) ENGINE = InnoDB DEFAULT CHARSET =" UTF8 ";
* This source code was highlighted with Source Code Highlighter .

After that, fill it with data, for this we need, for example, such a script in PHP

set_time_limit (0);

mysql_connect ("127.0.0.1", "root", "root_password");
mysql_select_db ("database_name");

function random_string () {
      $ string = "ABCDEFGHIKJLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";
      $ len = strlen ($ string);
      $ return = '';
      $ length = rand (10, 50);
      for ($ i = 0; $ i <$ length; $ i ++) {
         $ return. = $ string {rand (0, $ len)};
      }
      return $ return;
}

$ names = array ("maghamed", "maghamed1", "maghamed2", "maghamed3", "maghamed4");

mysql_query ("ALTER TABLE` some_table` DISABLE KEYS "); // stop updating non-unique indexes

for ($ i = 0; $ i <10000000; $ i ++) {
$ a = $ i% 1000;
$ b = rand (0, 4);
$ b = $ names [$ b];
$ c = random_string ();
$ sql = "INSERT INTO` some_table` SET field1 = $ {a}, field2 = '$ {b}', field4 = '$ {c}' ";
mysql_query ($ sql) or die ("Invalid query:„. mysql_error ());
}

mysql_query (“ALTER TABLE` some_table` ENABLE KEYS ");
?> * This source code was highlighted with Source Code Highlighter .

The code in this script is intuitive. I’ll pay attention only to “ALTER TABLE` some_table` DISABLE KEYS ”before inserting a large number of records, and“ ALTER TABLE `some_table` ENABLE KEYS”
after insertion. These directives will significantly speed up the script and will generally be useful in similar situations. “ALTER TABLE ... DISABLE KEYS” - stops updating non-unique indexes (in our case “fields1_2”, “fields2_3”) while inserting new records. MySQL uses a special algorithm, which is much faster than updating indexes after inserting each record, so disabling indexes before inserting large data should give significant speedup.

And such a query to this table:

mysql> EXPLAIN
  -> SELECT
  -> *
  -> FROM
  -> `some_table`
  -> WHERE
  -> field1 = 50 AND
  -> field2 = 'maghamed'
  -> ORDER BY
  -> field3 DESC
  -> LIMIT 100
  -> \ G
* *************************** 1. row ********************** ******
      id: 1
select_type: SIMPLE
    table: some_table
     type: ref
possible_keys: fields1_2, fields2_3
     key: fields1_2
   key_len: 30
     ref: const, const
     rows: 3042
    Extra: Using where; Using filesort
1 row in set (0.
Source Code Highlighter .

As you can see, MySQL in this case prefers to use the “fields1_2” index, which promotes a quick lookup, but sorting is performed without the help of the index, Extra field - Using where; Using filesort tells us this.

In this case, we can force MySQL to select the index at which the query will execute faster.

mysql> explain
  -> SELECT
  -> *
  -> FROM
  -> `some_table`
  -> FORCE INDEX (` fields2_3`)
  -> WHERE
  -> field1 = 50 AND
  -> field2 = 'maghamed'
  -> ORDER BY
  -> field3 DESC
  -> LIMIT 100;
+ ---- + ------------- + ------------ + ------ + ---------- ----- + ----------- + --------- + ------- + ------ + ------- ------ +
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+ ---- + ------------- + ------------ + ------ + ---------- ----- + ----------- + --------- + ------- + ------ + ------- ------ +
| 1 | SIMPLE | some_table | ref | fields2_3 | fields2_3 | 4 | const | 1928 | Using where |
+ ---- + ------------- + ------------ + ------ + ---------- ----- + ----------- + --------- + ------- + ------ + ------- ------ +
1 row in set (0.03 sec)
* This source code was highlighted with Source Code Highlighter .

* Let me note that MySQL can opt for the `fields2_3` index, and without FORCE INDEX it depends on the version of MySQL and the relevance of the index statistics. Therefore it is so important to maintain the relevance of the indexes (ANALYZE TABLE `some_table`). You should not force MySQL to choose between quick search, or sorting by index in this case, because both options will be executed more slowly than the option in which we have a composite index for 3 fields.

First, delete the old index:

DROP INDEX `fields1_2` ON` some_table`
* This source code was highlighted with Source Code Highlighter .

And add a new one

CREATE INDEX `fields1_2_3` ON` some_table` (field1, field2 (8), `field3`);
* This source code was highlighted with Source Code Highlighter .

Also, after the first article where it was proposed to use the GROUP BY BINARY crc32 (tag_text) construct for quick grouping (GROUP BY) of results, I got a lot of questions and doubts about this option, because additional time is spent on performing crc32 functions, and many believe that it will be ineffective, some suggested using MD5 instead of CRC32 to reduce the number of possible coincidence of the results for different values.

I will answer all such doubts with such a small benchmark.

SET @input: = 'hello world';
SELECT BENCHMARK (1000000, CRC32 (@input)); * This source code was highlighted with Source Code Highlighter .

1 row in set (0.22 sec)

SET @input: = 'hello world';
SELECT BENCHMARK (1000000, MD5 (@input)); * This source code was highlighted with Source Code Highlighter .

1 row in set (6.02 sec)

As you can see, the CRC32 function is very fast
* BENCHMARK - executes the specified CRC32 or MD5 fu in this case the specified number of times 1,000,000

I will finish the third article on this, and so it turned out to be longer, than planned. I will try to answer all questions.
If the community is still not tired of my articles, then in the next article on this topic I wanted to talk about the problems of scale-ing of large tables and other MySQL tastes :-)

Tags: