Using EXPLAIN. Query Improvement

Original author: Shameer C
  • Transfer
When you execute a query, the MySQL query optimizer tries to come up with an optimal plan for this query. You can see this very plan using the query with the keyword EXPLAIN. EXPLAIN is one of the most powerful tools at your disposal for understanding MySQL queries and optimizing them, but the sad fact is that many developers rarely use it. In this article, you will learn about what data EXPLAIN offers on output and get acquainted with an example of how to use it to optimize queries.


What does EXPLAIN offer?


Using the EXPLAIN statement is simple. It must be added to queries before the SELECT statement. Let's analyze the output to get acquainted with the information returned by the team.

EXPLAIN SELECT * FROM categories


********************** 1. row **********************
id: 1
select_type: SIMPLE
table: categories
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 4
Extra: 
1 row in set (0.00 sec)


The output may not look exactly like that, however, it will contain the same 10 columns. What are these returned columns?

  • id - serial number for each SELECT inside the query (when there are several subqueries)
  • select_type is the type of the SELECT request.

    • SIMPLE - Simple SELECT query without subqueries or UNIONs
    • PRIMARY - this SELECT is the most external query in JOIN
    • DERIVED - this SELECT is part of a subquery inside FROM
    • SUBQUERY - the first SELECT in a subquery
    • DEPENDENT SUBQUERY - a subquery that depends on an external query
    • UNCACHABLE SUBQUERY - a non-cached subquery (there are certain conditions for a query to be cached)
    • UNION - second or subsequent SELECT in UNION
    • DEPENDENT UNION - the second or subsequent SELECT in UNION, dependent on an external query
    • UNION RESULT - the result of UNION

  • Table - the table to which the displayed row belongs.
  • Type - indicates how MySQL links the tables used. This is one of the most useful fields in the output because it can report missing indexes or why a written request should be reviewed and rewritten.
    Possible values:

    • System - the table has only one row
    • Const - the table has only one corresponding row, which is indexed. This is the fastest type of join because the table is read only once and the value of the string can be perceived as a constant in future joins.
    • Eq_ref - all parts of the index are used for binding. Indexes used: PRIMARY KEY or UNIQUE NOT NULL. This is another best possible type of binding.
    • Ref - all the corresponding rows in the index column are read for each row combination from the previous table. This join type for indexed columns looks like using the = or <=> operators
    • Fulltext - the connection uses the full-text index of the table
    • Ref_or_null is the same as ref, but also contains rows with a null value for the column
    • Index_merge - The connection uses a list of indexes to get the result set. The key column of the EXPLAIN command output will contain a list of indexes used.
    • Unique_subquery - the IN subquery returns only one result from the table and uses the primary key.
    • Index_subquery is the same as the previous one, but returns more than one result.
    • Range - the index used to find the corresponding row in a certain range, usually when the key column is compared with a constant using operators like: BETWEEN, IN,>,> =, etc.
    • Index - scans the entire index tree to find the corresponding rows.
    • All - Scan the entire table to find the corresponding rows. This is the worst type of compound and usually indicates the absence of suitable indexes in the table.

  • Possible_keys - Indicates indexes that can be used to find rows in a table. In practice, they may or may not be used. In fact, this column can do a good job in optimizing queries, because a NULL value indicates that no suitable index was found.
  • Key - indicates the used index. This column may contain an index not specified in the possible_keys column. In the process of joining tables, the optimizer looks for the best options and can find keys that are not displayed in possible_keys, but are more optimal for use.
  • Key_len is the index length that the MySQL optimizer has selected for use. For example, a key_len value of 4 means that memory is required to store 4 characters. Here is a link to this topic
  • Ref - indicates columns or constants that are compared with the index specified in the key field. MySQL will choose either a constant value for comparison or the field itself, based on the query execution plan.
  • Rows - Displays the number of records processed to obtain output. This is another very important field that provides an opportunity to optimize queries, especially those that use JOINs and subqueries.
  • Extra - contains additional information related to the query execution plan. Values ​​such as “Using temporary”, “Using filesort”, etc. may be an indicator of a problem request. You can find a complete list of possible values here.


After EXPLAIN, you can use the EXTENDED keyword in the query and MySQL will show you additional information about how the query is executed. To see this information, you need to immediately execute the SHOW WARNINGS query immediately after an EXTENDED request. It is most useful to look at this information about a query that was executed after any changes made by the optimizer of queries.

EXPLAIN EXTENDED SELECT City.Name FROM City
JOIN Country ON (City.CountryCode = Country.Code)
WHERE City.CountryCode = 'IND' AND Country.Continent = 'Asia'


********************** 1. row **********************
id: 1
select_type: SIMPLE
table: Country
type: const
possible_keys: PRIMARY
key: PRIMARY
key_len: 3
ref: const
rows: 1
filtered: 100.00
Extra: 
********************** 2. row **********************
           id: 1
  select_type: SIMPLE
        table: City
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 4079
     filtered: 100.00
        Extra: Using where
2 rows in set, 1 warning (0.00 sec)


SHOW WARNINGS


********************** 1. row **********************
  Level: Note
   Code: 1003
Message: select `World`.`City`.`Name` AS `Name` from `World`.`City` join `World`.`Country` where ((`World`.`City`.`CountryCode` = 'IND'))
1 row in set (0.00 sec)


Troubleshooting performance issues with EXPLAIN.


Now let's look at how we can optimize a not-so-quick query by analyzing the output of the EXPLAIN command. There is no doubt that in current working applications there are a number of tables with many relationships between them, but sometimes it is difficult to predict the most optimal way to write a query.

I created a test database for an e-commerce application that does not have any indexes or primary keys, and I will demonstrate the effect of such a not-so-good way to create tables using scary queries. You can download the dump tables here - github.com/phpmasterdotcom/UsingExplainToWriteBetterMySQLQueries

EXPLAIN SELECT * FROM
orderdetails d
INNER JOIN orders o ON d.orderNumber = o.orderNumber
INNER JOIN products p ON p.productCode = d.productCode
INNER JOIN productlines l ON p.productLine = l.productLine
INNER JOIN customers c on c.customerNumber = o.customerNumber
WHERE o.orderNumber = 10101


If you look at the result (you will have to look at it only in the example below, the link above contains a dump with the keys already added), then you will see all the symptoms of a bad request.

UPDATE Here lies a fixed dump without indexes. For some reason, in the original author's dump, the indices were initially added.

But even if I write a better query, the result will be the same until I add indexes. The specified connection type is ALL (worst), which means that MySQL could not determine a single key that could be used in the connection. It follows that possible_keys and key are NULL. Most importantly, the rows field indicates that MySQL scans all the records of each table for the query. This means that it will scan 7 × 110 × 122 × 326 × 2996 = 91,750,822,240 records to find the suitable four (remove from the EXPLAIN request, check for yourself). This is very bad and the number of these records will increase exponentially as the database grows.

Now let's add obvious indexes, such as the primary key for each table, and run the query again. Taking this as a general rule, you can use the columns used in JOINs as candidates for adding keys, as MySQL always scans them to find the matching records.

ALTER TABLE customers
    ADD PRIMARY KEY (customerNumber);
ALTER TABLE employees
    ADD PRIMARY KEY (employeeNumber);
ALTER TABLE offices
    ADD PRIMARY KEY (officeCode);
ALTER TABLE orderdetails
    ADD PRIMARY KEY (orderNumber, productCode);
ALTER TABLE orders
    ADD PRIMARY KEY (orderNumber),
    ADD KEY (customerNumber);
ALTER TABLE payments
    ADD PRIMARY KEY (customerNumber, checkNumber);
ALTER TABLE productlines
    ADD PRIMARY KEY (productLine);
ALTER TABLE products 
    ADD PRIMARY KEY (productCode),
    ADD KEY (buyPrice),
    ADD KEY (productLine);
ALTER TABLE productvariants 
    ADD PRIMARY KEY (variantId),
    ADD KEY (buyPrice),
    ADD KEY (productCode);


Let's fulfill our previous query after adding the indexes. You will see this:

********************** 1. row **********************
           id: 1
  select_type: SIMPLE
        table: o
         type: const
possible_keys: PRIMARY,customerNumber
          key: PRIMARY
      key_len: 4
          ref: const
         rows: 1
        Extra: 
********************** 2. row **********************
           id: 1
  select_type: SIMPLE
        table: c
         type: const
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: const
         rows: 1
        Extra: 
********************** 3. row **********************
           id: 1
  select_type: SIMPLE
        table: d
         type: ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: const
         rows: 4
        Extra: 
********************** 4. row **********************
           id: 1
  select_type: SIMPLE
        table: p
         type: eq_ref
possible_keys: PRIMARY,productLine
          key: PRIMARY
      key_len: 17
          ref: classicmodels.d.productCode
         rows: 1
        Extra: 
********************** 5. row **********************
           id: 1
  select_type: SIMPLE
        table: l
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 52
          ref: classicmodels.p.productLine
         rows: 1
        Extra: 
5 rows in set (0.00 sec)


After adding indexes, the number of read records dropped to 1 × 1 × 4 × 1 × 1 = 4. For each record order_number = 10101 in the orderdetails table, this means that MySQL was able to find the corresponding records in all other tables using indexes and did not resort to a full table scan.

In the first output, you can use the join type - “const”, which is the fastest join type for tables with more than one record. MySQL was able to use the PRIMARY KEY as an index. In the “ref” field, “const” is displayed, which is nothing other than the value 10101 indicated in the query after the WHERE keyword.

We look at one more request. In it, we choose the union of two tables, products and productvariants, each combined with a productline. productvariants, which consists of different product options with a productCode field - a link to their prices.

EXPLAIN SELECT * FROM (
SELECT p.productName, p.productCode, p.buyPrice, l.productLine, p.status, l.status AS lineStatus FROM
products p
INNER JOIN productlines l ON p.productLine = l.productLine
UNION
SELECT v.variantName AS productName, v.productCode, p.buyPrice, l.productLine, p.status, l.status AS lineStatus FROM productvariants v
INNER JOIN products p ON p.productCode = v.productCode
INNER JOIN productlines l ON p.productLine = l.productLine
) products
WHERE status = 'Active' AND lineStatus = 'Active' AND buyPrice BETWEEN 30 AND 50


********************** 1. row **********************
           id: 1
  select_type: PRIMARY
        table: 
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 219
        Extra: Using where
********************** 2. row **********************
           id: 2
  select_type: DERIVED
        table: p
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 110
        Extra: 
********************** 3. row **********************
           id: 2
  select_type: DERIVED
        table: l
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 52
          ref: classicmodels.p.productLine
         rows: 1
        Extra: 
********************** 4. row **********************
           id: 3
  select_type: UNION
        table: v
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 109
        Extra: 
********************** 5. row **********************
           id: 3
  select_type: UNION
        table: p
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 17
          ref: classicmodels.v.productCode
         rows: 1
        Extra: 
********************** 6. row **********************
           id: 3
  select_type: UNION
        table: l
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 52
          ref: classicmodels.p.productLine
         rows: 1
        Extra: 
********************** 7. row **********************
           id: NULL
  select_type: UNION RESULT
        table: 
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra: 
7 rows in set (0.01 sec)


You may notice a number of problems in this query. It scans all entries in products and productvarians. Because these tables do not have indexes for productLine and buyPrice columns; possible_keys and key fields display NULL values. The status of the products and productlines tables is checked after UNION, so moving them inside the UNION will reduce the number of records. Add indexes.

CREATE INDEX idx_buyPrice ON products(buyPrice);
CREATE INDEX idx_buyPrice ON productvariants(buyPrice);
CREATE INDEX idx_productCode ON productvariants(productCode);
CREATE INDEX idx_productLine ON products(productLine);


EXPLAIN SELECT * FROM (
SELECT p.productName, p.productCode, p.buyPrice, l.productLine, p.status, l.status as lineStatus FROM products p
INNER JOIN productlines AS l ON (p.productLine = l.productLine AND p.status = 'Active' AND l.status = 'Active') 
WHERE buyPrice BETWEEN 30 AND 50
UNION
SELECT v.variantName AS productName, v.productCode, p.buyPrice, l.productLine, p.status, l.status FROM productvariants v
INNER JOIN products p ON (p.productCode = v.productCode AND p.status = 'Active') 
INNER JOIN productlines l ON (p.productLine = l.productLine AND l.status = 'Active')
WHERE
v.buyPrice BETWEEN 30 AND 50
) product


********************** 1. row **********************
          id: 1
  select_type: PRIMARY
        table: 
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 12
        Extra: 
********************** 2. row **********************
           id: 2
  select_type: DERIVED
        table: p
         type: range
possible_keys: idx_buyPrice,idx_productLine
          key: idx_buyPrice
      key_len: 8
          ref: NULL
         rows: 23
        Extra: Using where
********************** 3. row **********************
           id: 2
  select_type: DERIVED
        table: l
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 52
          ref: classicmodels.p.productLine
         rows: 1
        Extra: Using where
********************** 4. row **********************
           id: 3
  select_type: UNION
        table: v
         type: range
possible_keys: idx_buyPrice,idx_productCode
          key: idx_buyPrice
      key_len: 9
          ref: NULL
         rows: 1
        Extra: Using where
********************** 5. row **********************
           id: 3
  select_type: UNION
        table: p
         type: eq_ref
possible_keys: PRIMARY,idx_productLine
          key: PRIMARY
      key_len: 17
          ref: classicmodels.v.productCode
         rows: 1
        Extra: Using where
********************** 6. row **********************
           id: 3
  select_type: UNION
        table: l
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 52
          ref: classicmodels.p.productLine
         rows: 1
        Extra: Using where
********************** 7. row **********************
           id: NULL
  select_type: UNION RESULT
        table: 
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra: 
7 rows in set (0.01 sec)


As you can see, as a result, the number of scanned lines decreased from 2,625,810 (219 × 110 × 109) to 276 (12 × 23), which is an excellent gain in performance. If you execute the same query without previous permutations in the query immediately after adding indexes, you will not see such a decrease in the scanned rows. MySQL is not able to use indexes when WHERE is used in the derived result. After placing these conditions inside UNION, it becomes possible to use indexes. This means that adding indexes is not always enough. MySQL will not be able to use them until you write suitable queries. (http://www.php.su/mysql/manual/?page=MySQL_indexes - additional information).

Total


This article explores the EXPLAIN keyword, information on output, and examples of how you can use the output of a command to improve queries. In the real world, this command may be more useful than in the scenarios considered. You will almost always join a series of tables together using complex constructs with WHERE. At the same time, simply added indexes to tables will not always lead to the desired result. In this case, you need to review your requests.

Also popular now: