Who is responsible in agile for the quality of the development of complex projects, or the methodology of Quality Gates

Today we are witnessing how the waterfall-model of development is gradually dying out all over the world. She does not like for heavy weight and poor reaction to changes. This directly affects the relevance of the product and increases the TTM (time-to-market), resulting in additional costs. Developers are rebuilt on agile rails, and we are no exception.

The agile methodology was originally created for small teams that make a turnkey product in end-to-end mode and are themselves responsible for its quality. But what if you develop highly critical banking systems, which employ dozens of agile teams? How to achieve the confidence in the product, which gives a long, exhaustive testing in waterfall? In this post we will share our solution to this issue.

Everything solves the problem in different ways, but usually it all comes down to test automation. Tests on stubs are developed, the general rules for the backbone formation of neighboring commands are frameworks for command-line interaction of the SAFe type. As a result, thanks to the synchronization of backlogs, teams of related products can write and conduct tests together, including integration tests. We also have such frameworks.

But now we will put ourselves in the place of the owner of a complex and highly critical banking system. Who is still responsible for the quality of this whole product, if a dozen of responsible agile teams are engaged in it at once? We need confidence that nothing will roll up in production. Introduce additional testing? Hello, waterfall, and bye, TTM.

There is no perfect solution. In this situation, we will always have a conflict between the principles of the methodology and guaranteed reliability of the result. Here is a compromise we have found.

Quality gates

If the specifics of your product assumes that it is not completely isolated from others, then at the points of contact you should play according to one rule - observe a minimum level of quality. The code should be covered by modular tests, should not contain critical defects in information security, distributions should undergo smook tests as supplied. No tin, but the requirements are mandatory for all. Their execution is a pass to general testing.

So, in general, the practice of Quality Gates looks like - a set of automated checks built into the devops pipeline of each system. In fact, it reflects the tendency to shift-testing, which is now often spoken of as part of the devops.

We have agreed with all the teams on a series of checks, quality gates, which they must undergo during the passage of the stages of the life cycle.

Coding

Before assembling the code, a mandatory static analysis is required, checking the code for compliance with the standards of a specific programming language. And also on the completeness of the coating unit tests. For this there are different tools. We, for example, love the SonarQube. Having passed this quality gate, teams can be sure that they have not violated a number of basic rules at an early stage. For example, they avoided significant duplication, which increases the complexity of the code and the likelihood of problems.

The second check before assembling is an IB check. There are generally accepted practices for identifying IS problems in the code and tools that can scan the code and identify dangerous places. For example, an incorrectly declared variable can lead to problems in production. Here we have agreed not to allow everything that can be revealed at the stage of writing the code, to pentest and more complex checks.

Build distributive

When assembling a distribution kit, we will definitely check the result: that the assembly has passed correctly, that all services have started and work as it should, that the distribution kit can be installed on the desired environment, and it will work. Such a buiild verification test eliminates potential misunderstanding between the tester and the developer. In the waterfall-practice, it happened that the developer finished the work, passed the distribution kit to the testers, and when installed on a stand, it turned out that the assembly did not even start. Then the whole cycle was broken, the development was stretched and nothing good happened at all.

We have built an integration interaction is very difficult. It is important not to break the stand on which other teams can be checked. We can do this because of a bad distribution, and the neighbors on the stand will know about it before us - we just break the whole process of work with it. In addition, you can spoil the test data. And their preparation also costs money and takes considerable time. Especially when it comes to impersonal user data.

Smoke Tests

As the distribution kit is installed on each test bench, it passes a series of simple smoke tests. The functionality of the distribution kit is tested on the system test bench. Then the distribution kit is put on the integration testing bench, where integration interactions are tested. It also runs a set of smoke tests. If the distribution does not pass them, it cannot proceed to the next stage.

With the help of these quality gates we get the primary idea of the quality of the distribution. If Smoke tests passed successfully, the team proceeds to testing. If the distribution does not pass smoke tests at this stage, it will most likely not pass manual testing. Here we assign it only when the assembly is potentially ready to go to prom.

Quality gates as a framework

We strive to ensure that quality gates become a full-fledged framework for managing the quality of the development of a large number of products in agile. If a team constantly fails, even the required quality gates is a signal that there are problems that need to be discussed and solved. On the other hand, if a team has already mastered the basic quality gates and embedded them in internal procedures, it can go further and include additional quality gates.

In the future, we plan to roll out new sets of mandatory quality gates. And also optional, that each team with a sufficient level of maturity could choose what it needs. For example, if it is worthwhile to work on the stability of the distribution on the integration sites, the team will take one quality gates. If you need to make sure that a complex and multicomponent assembly does not make it difficult to deploy, it will take others. Someone has a bias in security at the front, someone in the direction of checks of load testing, availability of stands, response, someone ahead of integration or checking for some data. Each team will be able to find quality gates for their case.

It is important to note that quality gates is not a replacement for testing, but a primary control tool.. Testing nobody cancels. The main task here is to minimize the damage to other teams from the poor quality of the product as soon as possible.

Example of a third-party pipeline including quality gates

Results of the implementation of quality gates

First of all, we have increased the stability of the production cycle. Shift-left in action, we can immediately detect critical functionality bugs. Less time is spent on various testing iterations, defects are detected earlier, so that their elimination is cheaper.

Lead-time has decreased - the time from the start of coding features to its implementation in production. The stability of the engineering stage of TTM has increased due to the fact that we have reduced downtime in the process of delivering the distribution to the industrial environment. We spend the same amount of time for testing the same, but at the same time we have no downtime due to the fact that the stand has collapsed, we need to wait for the rebuilding of the distribution kit.

The availability of media for testing has grown. Previously, you could put an assembly on it and forget about it for a week. In the meantime, adjacent teams could not test in this environment, because your build is defective and you will know about it only in a week. Now, when you put the assembly on the landfill, you yourself test it for the most common problems, roll back, finish, return, if necessary. And the chance to not prevent anyone becomes much higher. The implementation of quality gates will also lead to a reduction in cost recovery for the stands and the retraining of data for testing.

Your opinion?

As we said at the beginning, the contradiction between the principles of agile and complex development cannot be cut like a Gordian knot. One can only strive to ensure that it brings as few problems as possible. In our case, the practice of quality gates helps, but, of course, we do not consider it ideal. How do you solve this problem? We would be very interested to discuss this issue.

Nikolay Vorobev-Sarmatov, Sberbank-Technologies, Sberworks
Thanks to Mikhail Bizhan for help in preparing the article!

Tags: