Limitations that need to be violated or how we accelerated the functional tests three times
Functional tests are a useful thing. At first, they do not take much time, but the project is growing, and more and more tests are needed. We did not intend to tolerate a slowdown in the delivery speed and, gathering our strength, we accelerated the functional tests three times. In the article you will find universal tips, however, you will notice a special effect on large projects.
Briefly about the application
My team is developing a public API that provides data to 2GIS users. When you go to 2gis.ru and search for "Supermarkets", you get a list of organizations - this is the data from our API. On our 2000+ RPS, almost every problem becomes critical if some functionality breaks down.
The application is written in Scala, the tests are written in PHP, the database is PostgreSQL-9.4. We have about 25,000 functional tests, they take 30 minutes on a dedicated virtual machine for general regression. We were not particularly bothered by the duration of the tests - we are used to the fact that tests could take 60 minutes on the old framework.
How we accelerated and so "quick" tests
It all started by chance. As usually happens. We supported one feature after another, and at the same time signed tests. Their number grew and the necessary time to perform - too. Once the tests started to get out within the time limits allotted to them, and therefore the process of their execution was completed forcibly. Incomplete tests are fraught with a missed problem in the code.
We analyzed the speed of the tests and the task of accelerating them has become urgent. So began a study called "Tests are working slowly - correct."
Below are three big problems that we found in the tests.
Problem 1: Incorrectly used jsQuery
All our data is stored in the PostgreSQL database. Basically - in the form of json, so we actively use jsQuery.
Here is an example of a query that we made in the database in order to get the necessary data:
SELECT * FROM firm WHERE json_data @@ 'rubrics.@# > 0'AND json_data @@ 'address_name = *'AND json_data @@ 'contact_groups.#.contacts.#.type = “website”'ORDERBY RANDOM() LIMIT1
It is easy to see that the example uses json_data several times in a row, although it would be correct to write this:
SELECT * FROM firm WHERE json_data @@ 'rubrics.@# > 0 AND address_name = * AND contact_groups.#.contacts.#.type = “website”'ORDERBY RANDOM() LIMIT1
Such flaws are not too striking, because in the tests we do not write all the queries with our hands, but instead we use QueryBuilders, which themselves compose them after specifying the necessary functions. We did not think about the fact that this may affect the speed of query execution. Accordingly, in the code it looks something like this:
$qb = $this>createQueryBulder() ->selectAllBranchFields() ->fromBranchPartition() ->hasRubric() ->hasAddressName() ->hasWebsite() ->orderByRandom() ->setMaxResults(1);
Do not repeat our mistakes : if there are several conditions in one JSONB field, describe them all within the same operator '@@'. After we remade, we accelerated the execution time of each request twice. Previously, the described request took 7500ms, and now it takes 3500ms.
Problem 2: Extra test data
Access to our API is provided by key, for each user of API it has its own. Previously, in tests it was often necessary to modify key settings. Because of this, tests fell.
We decided to create several keys with the necessary settings at each regression run to eliminate intersection problems. And since the creation of a new key does not affect the functionality of the entire application, this approach in the tests will not affect anything. We lived in such conditions for about a year, until we started to deal with performance.
There are not so many keys - 1000 pieces. To speed up the application, we store them in memory and update every few minutes or on demand. Thus, after saving the next key, tests started the synchronization process, the end of which we did not wait - we received in response “504”, which was written to the logs. At the same time, the application did not signal the problem in any way and we thought that everything works fine for us. The regression testing process itself continued. And in the end it turned out that we were always lucky and our keys were saved.
We lived in ignorance until we checked the logs. It turned out that we created the keys but did not delete them after running the tests. Thus, we have accumulated 500,000 of them.
Do not repeat our mistakes:if you somehow modify the database in the tests, be sure to ensure that the database is returned to its original state. After we cleaned the base, the process of updating the keys has accelerated 500 times.
Problem 3: Random data sampling
We love to check the application for various data. We have a lot of data, and periodically there are problems. For example, there was a case when we were not unloaded with data on advertising, but the tests caught this problem in time. That is why in each query of our tests, you can see ORDER BY RANDOM ().
When we looked at the results of queries, with and without a random file, we saw a performance increase of 20 times with EXPLAIN. If we talk about the example above, then without a random he works out for 160ms. We seriously thought about what we should do, because we didn’t really want to completely refuse the random house.
For example, in Novosibirsk there are about 150 thousand companies, and when we tried to find a company that has an address, a website and a rubric, we received random records from almost the entire database. We decided to reduce the sample to the first 100 companies that fit our conditions. The result of deliberation was a compromise between the constant sampling of different data and speed:
SELECT * FROM (SELECT * FROM firm_1 WHERE json_data @@ 'rubrics.@# > 0 AND address_name = * AND contact_groups.#.contacts.#.type = "website"'LIMIT100) random_hack ORDERBY RANDOM() LIMIT1;
In this simple way, we almost lost nothing at 20-fold acceleration. The execution time of such a request is 180ms.
Do not repeat our mistakes: this moment, of course, can hardly be called a mistake. If you really have a lot of tests, always think about how much random data you need. The compromise between the speed of query execution in the database and the uniqueness of the sample helped us to speed up SQL queries by 20 times.
Once again a short list of actions:
- If we specify several conditions for sampling data in the JSONB field, then they need to be listed in one operator '@@'.
- If we create test data, be sure to delete them. Even if it seems that their presence does not affect the functionality of the application.
- If random data is needed for each run, then we find a compromise between the uniqueness of the sample and the speed of execution.
We have accelerated the regression three times thanks to simple (and for some, perhaps even obvious) modifications. Now our 25K tests pass in 10 minutes. And this is not the limit - on the queue we have code optimization. It is not known how many unexpected discoveries are still waiting for us there.