dsemyriazhko May 29, 2019 at 22:37

How to stop forgetting about indexes and start checking execution plan in tests

Some time ago, an unpleasant story happened to me, which served as a trigger for a small project on the github and resulted in this article.

A typical day, a regular release: all tasks are checked up and down by our QA engineer, so with the calmness of the sacred cow, we roll onto the stage. The application behaves well, in the logs - silence. We decide to do switch (stage <-> prod). We switch, look at the instruments ... A

couple of minutes pass, the flight is stable. The QA engineer does a smoke test, notices that the application is somehow unnaturally slowing down. We write off to warm the caches.

A couple of minutes pass, the first complaint is from the first line: the data is downloaded from the clients for a very long time, the application slows down, it takes a long time to respond, etc. We start to worry ... we look at the logs, we look for possible reasons.

A couple of minutes later, a letter arrives from DB admins. They write that the execution time of queries to the database (hereinafter referred to as the database) has broken through all possible boundaries and tends to infinity.

I open monitoring (I use JavaMelody ), I find these requests. I start PGAdmin, I reproduce. Really long. I add “explain”, I look at the execution plan ... it is, we forgot about the indexes.

Why is code review not enough?

That incident taught me a lot. Yes, I “extinguished the fire” for an hour, creating the right index directly on the prod in something like this (do not forget about the CONCURRENTLY option):

CREATE INDEX CONCURRENTLY IF NOT EXISTS ix_pets_name 
ON pets_table (name_column);

Agree, this was tantamount to a deployment with downtime. For the application I'm working on, this is unacceptable.

I made conclusions and added a special bold point to the checklist for code review: if I see that during the development process one of the Repository classes was added / changed - I check sql migrations for the presence of a script that creates and changes the index there. If he is not there, I am writing to the author a question: is he sure that the index is not needed here?

It is likely that an index is not needed if there is little data, but if we work with a table in which the number of rows are counted in millions, an index error can become fatal and lead to the story set out at the beginning of the article.

In this case, I ask the author of the pull request (hereinafter PR) to be 100% sure that the query that he wrote in HQL is at least partially covered by the index (Index Scan is used). For this, the developer:

launches the application
looking for converted (HQL -> SQL) query in the logs
opens PGAdmin or another database administration tool
generates in the local database, so as not to interfere with their experiments, an amount of data acceptable for tests (minimum 10K - 20K records)
fulfills the request
requests execution plan
carefully studies it and draws appropriate conclusions
adds / modifies the index, ensuring that the execution plan suits it
unsubscribes in PR that the coverage of the request checked
expertly assessing the risks and severity of the request, I can double-check its actions

A lot of routine actions and the human factor, but for some time I was satisfied, and I lived with this.

On the way home

They say it’s very useful at least sometimes to go from work without listening to music / podcasts along the way. At this time, just thinking about life, you can come to interesting conclusions and ideas.

One day I walked home and thought about what happened that day. There was a few review, I checked each one with a checklist and did a series of actions described above. I got so tired that time, I thought, what the hell? Is it not possible to do this automatically? .. I took a quick step, wanting to quickly “gash” this idea.

Formulation of the problem

What is most important for the developer in the execution plan?
Of course, seq scan on large amounts of data caused by the lack of an index.

Thus, it was necessary to do a test that:

Performed on a database with a configuration similar to the prod
Intercepts a database query made by a JPA repository (Hibernate)
Gets It Execution Plan
Parsit Execution Plan, laying it out in a data structure convenient for checks
Using a convenient set of Assert methods, checks expectations. For example, that seq scan is not used.

It was necessary to quickly test this hypothesis by making a prototype.

Solution Architecture

The first problem that was to be solved was the launch of the test on a real database that matches the version and settings with the one used on the prod.

Thanks to Docker & TestContainers , they solve this problem.

SqlInterceptor, ExecutionPlanQuery, ExecutionPlanParse, and AssertService are the interfaces that I have currently implemented for Postgres. The plans are to implement for other databases. If you want to participate - welcome. The code is written in Kotlin.

All this together I posted on GitHub and called checkinx-utils . You do not need to repeat this, just connect the dependency to checkinx in maven / gradle and use convenient asserts. How to do this, I will describe in more detail below.

Description of the interaction of CheckInx components

ProxyDataSource

The first problem that was to be solved was the interception of database queries ready for execution. Already with the established parameters, without questioning, etc.

To do this, you need to wrap the real dataSource in a certain Proxy, which would allow you to integrate into the query execution pipeline and, accordingly, intercept them.

Such ProxyDataSource has already been implemented by many. I used the ready-made ttddyy solution , which allows me to install my Listener intercepting the request I need.

I substitute the original DataSource using the DataSourceWrapper (BeanPostProcessor) class.

SqlInterceptor

In essence, its start () method sets its Listener to proxyDataSource and starts intercepting requests, storing them in the internal statements list. The stop () method, respectively, removes the installed Listener.

ExecutionPlanQuery

Here, the original request is transformed into a request for an execution plan. In the case of Postgres, this is an addition to the query keyword "EXPLAIN".

Further, this query is executed on the same database from testcontainders and a “raw” execution plan (list of lines) is returned.

ExecutionPlanParser

It is inconvenient to work with a “raw” execution plan. Therefore, I parse it into a tree consisting of nodes (PlanNode).

Let's analyze the PlanNode fields using an example of a real ExecutionPlan:

Index Scan using ix_pets_age on pets  (cost=0.29..8.77 rows=1 width=36)
  Index Cond: (age < 10)
  Filter: ((name)::text = 'Jack'::text)

Property	Example	Description
raw: String	Index Scan using ix_pets_age on pets (cost = 0.29..8.77 rows = 1 width = 36)	source string
table: String?	pets	table name
target: String?	ix_pets_age	index name
coverage: String?	Index scan	coating
coverageLevel	Half	coating abstraction (ZERO, HALF, FULL)
children: MutableList	-	child nodes
properties: MutableList>	key : Index Cond, value : (age <10); key : Filter, value : ((name) :: text = 'Jack' :: text)	properties
others: MutableList	-	All that could not be recognized in the current version of checkinx

AssertService

It is already possible to work normally with the data structure returned by the parser. CheckInxAssertService is a set of checks of the PlanNode tree described above. It allows you to set your own lambdas of checks or use the predefined, in my opinion, the most popular ones. For example, so that your query does not have Seq Scan, or you want to make sure that a specific index is used / not used.

Coveragelevel

Very important Enum, I will describe it separately:

Value	Description
NOT_USING	checks that a specific target (index) is not used
ZERO	index not used (Seq Scan)
Half	partial coverage of the query by index (Index Scan). For example, a search is performed by index, but for the resulting data it refers to a table
FULL	full coverage of the query by index (Index Only Scan)
UNKNOWN	unknown coverage. For some reason, it was not possible to install it.

Next, we’ll review a few usage examples.

Test Examples Using CheckInx

I did a separate project on GitHub checkinx-demo , where I implemented a JPA repository for the pets table and tests for this repository checking coverage, indexes, etc. It will be useful to look there as a starting point.

You might have a test like this:

@Test
fun testFindByLocation() {
   // ARRANGE
   val location = "Moscow"
   // Генерируем тестовые данные, их должно быть достаточно много 10К-20К.
   // Лучше использовать TestNG и вынести этот код в @BeforeClass
   IntRange(1, 10000).forEach {
       val pet = Pet()
       pet.id = UUID.randomUUID()
       pet.age = it
       pet.location = "Saint Petersburg"
       pet.name = "Jack-$it"
       repository.save(pet)
   }
   // ACT
   // Начинаем перехват запросов
   sqlInterceptor.startInterception()
   // Тестируемый метод
   val pets = repository.findByLocation(location)
   // Заканчиваем перехват
   sqlInterceptor.stopInterception()
   // ASSERT
   // Здесь можно проверить сколько запросов было исполнено
   assertEquals(1, sqlInterceptor.statements.size.toLong())
   // Убеждаемся, что используется индекс ix_pets_location с частичным покрытием (Index Scan)
   checkInxAssertService.assertCoverage(CoverageLevel.HALF, "ix_pets_location", sqlInterceptor.statements[0])
   // Если нам все равно какой индекс будет использоваться, но важно чтобы не было Seq Scan, мы можем проверить минимальный уровень покрытия
   checkInxAssertService.assertCoverage(CoverageLevel.HALF, sqlInterceptor.statements[0])
   // ... тоже самое, но используя свою лямбду
   checkInxAssertService.assertPlan(plan) {
           it.coverageLevel.level < CoverageLevel.FULL.level
       }
}

The implementation plan could be as follows:

Index Scan using ix_pets_location on pets pet0_  (cost=0.29..4.30 rows=1 width=46)
  Index Cond: ((location)::text = 'Moscow'::text)

... or like this if we forgot about the index (tests turn red):

Seq Scan on pets pet0_  (cost=0.00..19.00 rows=4 width=84)
  Filter: ((location)::text = 'Moscow'::text)

In my project, I mostly use the simplest assert, which says that there is no Seq Scan in the execution plan:

checkInxAssertService.assertCoverage(CoverageLevel.HALF, sqlInterceptor.statements[0])

The presence of such a test suggests that I, at least, studied the implementation plan.
It also makes project management more explicit, and documentability and predictability of the code increases.

Experienced mode

I recommend using the CheckInxAssertService, but if necessary, you can bypass the parsed tree (ExecutionPlanParser) yourself or, in general, parse the raw execution plan (the result of executing ExecutionPlanQuery).

@Test
fun testFindByLocation() {
   // ARRANGE
   val location = "Moscow"
   // ACT
   // Начинаем перехват запросов
   sqlInterceptor.startInterception()
   // Тестируемый метод
   val pets = repository.findByLocation(location)
   // Заканчиваем перехват
   sqlInterceptor.stopInterception()
   // ASSERT
   // Получаем "сырой" план выполнения
   val executionPlan = executionPlanQuery.execute(sqlInterceptor.statements[0])
   // Получаем распарсенный план - дерево
   val plan = executionPlanParser.parse(executionPlan)
   assertNotNull(plan)
   // ... сами делаем обход
   val rootNode = plan.rootPlanNode
   assertEquals("Index Scan", rootNode.coverage)
   assertEquals("ix_pets_location", rootNode.target)
   assertEquals("pets pet0_", rootNode.table)
}

Connection to the project

In my project, I allocated such tests to a separate group, calling it Intensive Integration Tests.

Connecting and starting using checkinx-utils is easy enough. Let's start with the build script.

Connect the repository first. Someday I will upload checkinx to maven, but now you can download artifact only from GitHub via jitpack.

repositories {
  // ...
   maven { url 'https://jitpack.io' }
}

Next, add the dependency:

dependencies {
// ...
   implementation 'com.github.tinkoffcreditsystems:checkinx-utils:0.2.0'
}

We complete the connection by adding the configuration. Only Postgres is currently supported.

@Profile("test")
@ImportAutoConfiguration(classes = [PostgresConfig::class])
@Configuration
open class CheckInxConfig

Pay attention to the test profile. Otherwise, you will find ProxyDataSource in your prod.

PostgresConfig connects several beans:

DataSourceWrapper
PostgresInterceptor
PostgresExecutionPlanParser
PostgresExecutionPlanQuery
CheckInxAssertServiceImpl

If you need some kind of customization that the current API does not provide, you can always replace one of the bean with your implementation.

Known Issues

Sometimes a DataSourceWrapper fails to replace the original dataSource due to the Spring CGLIB proxy. In this case, not a DataSource comes to BeanPostProcessor, but ScopedProxyFactoryBean and there are problems with type checking.

The easiest solution would be to manually create HikariDataSource for tests. Then your configuration will be as follows:

@Profile("test")
@ImportAutoConfiguration(classes = [PostgresConfig::class])
@Configuration
open class CheckInxConfig {
   @Primary
   @Bean
   @ConfigurationProperties("spring.datasource")
   open fun dataSource(): DataSource {
       return DataSourceBuilder.create()
           .type(HikariDataSource::class.java)
           .build()
   }
   @Bean
   @ConfigurationProperties("spring.datasource.configuration")
   open fun dataSource(properties: DataSourceProperties): HikariDataSource {
       return properties.initializeDataSourceBuilder()
           .type(HikariDataSource::class.java)
           .build()
   }
}

Development plans

I would like to understand if anyone other than me needs this? To do this, create a survey. I will be glad to answer honestly.
See what you really need and expand the standard list of assert methods.
Write implementations for other databases.
The construction of sqlInterceptor.statements [0] does not look very obvious, I want to improve it.

I would be glad if someone wants to join in and gain some credit by practicing in Kotlin.

Conclusion

I am sure that there will be comments: it is impossible to predict how the query planner will behave on the prod, it all depends on the statistics collected .

Indeed, a planner. Using the statistics collected earlier, it can build a plan different from the one being tested. The meaning is a little different.

The task of the planner is to improve, not worsen, the request. Therefore, for no apparent reason, he will not suddenly use Seq Scan, but you can unknowingly.

You need CheckInx so that when writing a test, do not forget to study the query execution plan and consider the possibility of creating an index, or vice versa, clearly show with a test that no indexes are needed here and you are satisfied with Seq Scan. This would save you unnecessary questions on the code review.

References

Only registered users can participate in the survey. Please come in.

Use of article material

8% I will use it now 5
53.2% Maybe someday I'll try 33
25.8% It was useful, but I will not use 16
12.9% Useless Material 8
0% Already checking execution plan in code with another tool 0

Tags: