The history of one MySQL optimization

It will be about optimization in MySQL database.

This happened when we made a system for email newsletters. Our system was supposed to send tens of millions of letters a day. Sending a letter is not an easy task, although everything looks pretty primitive:

  1. Collect a letter from html creative, substitute personalized data.
  2. Add a message viewing pixel, replace all links in the message with your own - to track clicks.
  3. Check before sending that the email is not in the black list.
  4. Send an email to a specific pool.

I’ll tell you more about the second point:
Mail-builder microservice is preparing a letter for sending:

  • finds all links in the letter;
  • a unique 32-character uuid is generated for each link;
  • replaces the original link with a new one and saves the data in the database.

Thus, all source links will be replaced with uuid, and the domain will be changed to ours. When you get a GET request using this link, we proxy the original image or redirect to the original link. Saving occurs in the MySQL database, we save the generated uuid along with the original link and with some meta information (user email, mailing id and other data). Denormalization helps us in 1 request to get all the necessary data to save statistics, or start some kind of trigger chain.

Problem number 1


The generation of uuid in us depended on timestamp.

Since mailings usually occur in a certain period of time and many instances of microservice for assembling a letter are launched, it turned out that some of the uuids were very similar. This gave a low selectivity. UPD: because the data was similar, working with the bi-tree was not very effective.

We solved this problem using the uuid module in python, where there is no time dependence.
Such an implicit thing reduced the speed of indexes.

How is storage going?

The structure of the table was as follows:

CREATE TABLE IF NOT EXISTS `Messages` (
`UUID` varchar(32) NOT NULL,
`Message` json NOT NULL,
`Inserted` DATE NOT NULL,
PRIMARY KEY (`UUID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

At the time of creation, everything looked logical: the
UUID is a primary key, and it is also a cluster index. When we make a selection on this field, we simply select the record, because all values ​​are stored right there. This was a deliberate decision. Learn more about the clustered index.

Everything was great until the table grew.

Problem number 2


If you read more about the cluster index, you can find out about this nuance:
When adding a new row to the table, it is added not to the end of the file, not to the end of the flat list, but to the desired branch of the tree structure corresponding to it by sorting.
Thus, with increasing load, insertion time increased.

The solution was to use a different table structure.

CREATE TABLE IF NOT EXISTS `Messages` (
`ID` int(11) NOT NULL AUTO_INCREMENT,
`UUID` varchar(32) NOT NULL,
`Message` json NOT NULL,
`Inserted` DATE NOT NULL,
PRIMARY KEY (`ID`),
UNIQUE KEY `UUID` (`UUID`, `Inserted`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Since the primary key is now auto-incrementing, and mysql stores the cache of the last insertion place, now the insertion always occurs at the end, i.e. Innodb is optimized for writing sequentially increasing values.

I found the details of this optimization in the postgres source code. Mysql implements very similar optimization.
Of course, I had to add a unique key so that there were no conflicts, but we increased the insertion speed.

With the base growing even more, we thought about deleting old data. Using DELETE on the Inserted field is absolutely not optimal - this is a very long time, and the space will not be freed until we execute the optimize table command . By the way, this operation completely blocks the table - this did not suit us at all.

Therefore, we decided to split our table into partitions.
1 day - 1 partition, the old ones drop automatically when the time comes.

Problem number 3


We got the opportunity to delete old data, but we did not get the opportunity to choose from the desired partition, because with select`e we specify only uuid, mysql does not know in which partition we should look for it and is looking in all.

The solution was born from Problem # 1 - add a timestamp to the generated uuid. Only this time we did a little differently: we inserted a timestamp in a random place on the line, not at the beginning or at the end; before and after they added a dash symbol so that it can be obtained with a regular expression.

With this optimization, we were able to get the date when the uuid was generated and already make a select indicating the specific value of the Inserted field. Now we read the data immediately from the partition we need.

Also thanks to things like ROW_FORMAT = COMPRESSEDand changing the encoding to latin1 , we saved even more space on the hard drive.

Also popular now: