Automate it! How we improved integration testing
In the old days, we had only a few services, and putting in an day an update of more than one of them at production was a great success . Then the world accelerated, the system became more complex, and we transformed into an organization with microservice architecture. Now we have about a hundred services, and with the increase in their number, the frequency of releases also increases - there are more than 250 per week.
And if new features are tested inside product teams, then the task of the integration testing team is to verify that the changes included in the release do not break the functionality of the component, system, and other features.
I work as a test automation engineer at Yandex.Money.
In this article I’ll talk about the evolution of integration testing of web services, as well as about adapting the process to increase the number of system components and increase the frequency of releases.
About changes in the release cycle and the development of the calculation mechanism were described by ops and dev in one of the previous articles . I’ll tell you about the history of changes in testing processes during this transformation.
Now we have about 30 development teams. The team usually includes the product manager, project manager, front-end and back-end developers and testers. They are united by work on tasks for a specific product. As a rule, a team is responsible for the service, which most often makes changes to it.
Not so long ago, with the release of each component, only unit and component tests were run, and after that only a few of the most important end-to-end scripts were run on a full-fledged test environment before putting the service into production. Along with the increase in the number of components, the number of connections between them began to grow exponentially. Often - completely non-trivial connections. I recall how the unavailability of the service for issuing marketing data broke user registration completely (of course, for a short time).
This approach to checking changes began to fail more and more often - it required covering all critical business scenarios with autotests and running them on a full-fledged test environment with a component version ready for release.
Okay, autotests for critical scenarios have appeared - but how to run them? There was a task to integrate into the release cycle, minimally affecting its reliability with false test drops. On the other hand, I wanted to carry out the integration testing stage as quickly as possible. So there was an infrastructure for carrying out acceptance tests.
We tried to make the most of the tools already used to carry out the component on the release cycle and launch tasks: Jira and Jenkins, respectively.
To conduct acceptance testing, we determined the following cycle:
The entire cycle was performed manually each time. As a result, already on the tenth release a day, I wanted to swear at performing the same tasks, at best, under my breath, clutching my head and demanding valerianbeer .
We realized that tracking and reporting new tasks in Jira are important processes that are quickly and easily automated. So there was a bot that does this.
The data for generating alerts comes in the form of push notifications from Jira. After starting the bot, we stopped updating the dashboard page with acceptance tasks, and the width of the smile of the automaton slightly increased.
We decided to simplify the verification that during the deployment in the test environment no assembly or installation errors occurred and that the desired version of the component was raised, and not some other one. The component gives its version and status via HTTP. And checking that the service returns the correct version would be simple and understandable if different components were not written in different languages - some in Node.js, some in C #. In addition, our most popular services in Java, too, gave the version in a different format.
Plus, I wanted to have real-time information and notifications not only about version changes, but also about changes in the availability of components in the system. To solve this problem, the Pinger service appeared, which collects information about the status and version of components by polling them cyclically.
We use a push model of message delivery - an agent is deployed on each instance of the test environment, which collects information about the components of this environment and stores the data on a central node every 10 seconds. We go to this node for the current status - this approach allows us to support more than a hundred test stands.
The time has come for more complex tasks - automatic updating of components and running tests. At that time, our team already had 3 test benches in OpenStack for acceptance tests, and first it was necessary to solve the problem of managing the resources of test benches: it would be unpleasant if the update of the next release “rolls” when running tests on the system. It also happens that the test bench is debugged, and then you should not use it for acceptance.
I wanted to be able to see the status of employment and, if necessary, manually lock the stand for the duration of the analysis of fallen tests or until the completion of other work.
For all this, the Locker service appeared. It stores the status of the test bench for a long time (“busy” / “free”), allows you to specify a comment on “busy”, so that it is clear that we are now debugging, re-creating a copy of the test environment or running tests for the next release. We also began to block stands for the night - on them administrators carry out work on a schedule, such as backups and database synchronization.
When blocking, the time is always set after which the lock expires - now people do not need to participate in returning stands to the available pool, and the machine does everything.
To evenly distribute the load among the team members to analyze the results of test runs, we came up with daily shifts. The attendant works with the tasks of acceptance testing of releases, parses fallen autotests and reports bugs. If the attendant understands that he is not coping with the flow of tasks, he may ask the team for help. At this time, the rest of the team members are engaged in tasks not related to releases.
With the increase in the number of releases, the role of the second attendant appeared, which connects to the main one if “blockages” arise or critical releases are in the queue. To provide information on the progress of testing releases, we created a page with the number of tasks in the “open” / “running” / “waiting for a response on duty” states, the status of the test bench blocking and the components that are not available on the stands:
The work of the duty officer requires concentration, so he has a bun - on the day of duty, he can choose a place for lunch for the whole team near the office. The bribes on duty in the style look especially fun: “let me help you sort out the tasks, and today we’ll go to my favorite place” =)
One of the tasks that we encountered when we introduced the watch was the need to transfer knowledge from one officer to another, for example, about tests falling on a new release or the specifics of updating a component.
In addition, we have new features.
So the Reporter service appeared. In it we push the results of the test run in real time during the testing process. The service has a database of known problems or bugs that are linked to a specific test. A publication was also added on the company's wiki portal for a summary report on the results of a run from a reporter. This is convenient for managers who do not want to dive into the technical details that the Reporter or Allure interface abounds with.
If the test crashes, you can see in the Reporter a list of related bugs or fix tasks. Such information shortens the parsing time and facilitates the exchange of knowledge about problems between members of our team. Records of completed tasks are archived, but if necessary, you can “peep” them into a separate list. In order not to load internal services during business hours, we interview Jira at night and archive entries for issues with final status.
A bonus from the introduction of Reporter was the appearance of a run database, on the basis of which you can analyze the frequency of falls, rank the tests according to their level of stability or “usefulness” in terms of the number of bugs found.
Next, we moved on to automating the launch of tests when the issue of acceptance testing of the release comes to the issue tracker. For this purpose, the Autorun service was written, which checks whether there are new acceptance tasks in Jira, and if so, it determines the name of the component and its version based on the contents of the task.
For the task, several steps are performed:
Switching between stages is organized by the principle of a finite state machine. Each stage itself knows the conditions for the transition to the next. The results of the stage are stored in the task context, which is common for the stages of one task.
All this allows you to automatically transfer releases along the deployment pipeline, according to which 100 percent of the tests are green. But what about the instability caused not by problems in the component, but by the "natural" features of the UI tests or by the increased network delays in the test bench?
To do this, we have implemented a retry mechanism, which many people use, but few recognize this. Retrays are organized as a sequential run of tests in the Jenkins Pipeline.
After the run, we request a list of fallen tests from Reporter from Jenkins - and restart only failed ones. In addition, we reduce the number of threads at startup. If the number of dropped tests has not decreased compared to the previous run, we immediately end Job. In our case, this approach to restarting allows to increase the success of acceptance testing by about 2 times.
The resulting acceptance testing system allowed us to conduct more than 60% of releases without human intervention. But what to do with the rest? If necessary, the attendant creates a bug report on the component under test or the task of fixing tests to the development team. Sometimes - draws up a test bench configuration bug to the operation department.
Tasks for correcting tests often block the correct passage of automatic testing, since irrelevant tests will always be “red”. The testers from the development teams are responsible for writing new tests and updating existing ones - making changes through pull requests to the project with automatic tests. These edits are subject to a mandatory review, which requires some time from the reviewer and from the author, and I want to temporarily block irrelevant tests until the task is translated to their final status.
First, we implemented a shutdown mechanism based on annotations of test methods. Subsequently, it turned out that due to the presence of a mandatory code review, blocking from the code is not always convenient and may take longer than we would like.
Therefore, we moved the list of tasks blocking tests to a new service with a web page - Quick-block. So members of the team responsible for the component can quickly block the test. Before the run, we go to this service and get a list of quarantined tests, which we translate into skipped status.
We have gone from the acceptance of releases in manual mode to an almost completely automatic process, which is able to conduct through acceptance testing of more than 50 releases per day. This helps the company reduce the time it takes to post changes, and our team can find resources for experimenting and developing testing tools.
In the future we plan to increase the reliability of the process, for example, by distributing requests between a pair of instances of each service from the list above. This will allow you to update tools without downtime and include new features only for part of the acceptance tests. In addition, we pay attention to stabilizing the tests themselves. In development, a ticket generator for refactoring tests with the lowest success rate.
Improving the reliability of tests will not only increase confidence in them, but also speed up testing of releases due to the lack of restarts of fallen scripts.
And if new features are tested inside product teams, then the task of the integration testing team is to verify that the changes included in the release do not break the functionality of the component, system, and other features.
I work as a test automation engineer at Yandex.Money.
In this article I’ll talk about the evolution of integration testing of web services, as well as about adapting the process to increase the number of system components and increase the frequency of releases.
About changes in the release cycle and the development of the calculation mechanism were described by ops and dev in one of the previous articles . I’ll tell you about the history of changes in testing processes during this transformation.
Now we have about 30 development teams. The team usually includes the product manager, project manager, front-end and back-end developers and testers. They are united by work on tasks for a specific product. As a rule, a team is responsible for the service, which most often makes changes to it.
End-to-end acceptance testing
Not so long ago, with the release of each component, only unit and component tests were run, and after that only a few of the most important end-to-end scripts were run on a full-fledged test environment before putting the service into production. Along with the increase in the number of components, the number of connections between them began to grow exponentially. Often - completely non-trivial connections. I recall how the unavailability of the service for issuing marketing data broke user registration completely (of course, for a short time).
This approach to checking changes began to fail more and more often - it required covering all critical business scenarios with autotests and running them on a full-fledged test environment with a component version ready for release.
Okay, autotests for critical scenarios have appeared - but how to run them? There was a task to integrate into the release cycle, minimally affecting its reliability with false test drops. On the other hand, I wanted to carry out the integration testing stage as quickly as possible. So there was an infrastructure for carrying out acceptance tests.
We tried to make the most of the tools already used to carry out the component on the release cycle and launch tasks: Jira and Jenkins, respectively.
Acceptance Testing Cycle
To conduct acceptance testing, we determined the following cycle:
- monitoring of incoming tasks for acceptance testing of a release,
- running Jenkins job to install the release build on a test environment,
- check that the service has risen,
- launch Jenkins job with integration tests,
- analysis of the results of the run,
- repeated test run (if necessary),
- updating the status of the task - completed or broken, indicating the reason in the comment.
The entire cycle was performed manually each time. As a result, already on the tenth release a day, I wanted to swear at performing the same tasks, at best, under my breath, clutching my head and demanding valerian
Monitor Bot
We realized that tracking and reporting new tasks in Jira are important processes that are quickly and easily automated. So there was a bot that does this.
The data for generating alerts comes in the form of push notifications from Jira. After starting the bot, we stopped updating the dashboard page with acceptance tasks, and the width of the smile of the automaton slightly increased.
Pinger
We decided to simplify the verification that during the deployment in the test environment no assembly or installation errors occurred and that the desired version of the component was raised, and not some other one. The component gives its version and status via HTTP. And checking that the service returns the correct version would be simple and understandable if different components were not written in different languages - some in Node.js, some in C #. In addition, our most popular services in Java, too, gave the version in a different format.
Plus, I wanted to have real-time information and notifications not only about version changes, but also about changes in the availability of components in the system. To solve this problem, the Pinger service appeared, which collects information about the status and version of components by polling them cyclically.
We use a push model of message delivery - an agent is deployed on each instance of the test environment, which collects information about the components of this environment and stores the data on a central node every 10 seconds. We go to this node for the current status - this approach allows us to support more than a hundred test stands.
Locker
The time has come for more complex tasks - automatic updating of components and running tests. At that time, our team already had 3 test benches in OpenStack for acceptance tests, and first it was necessary to solve the problem of managing the resources of test benches: it would be unpleasant if the update of the next release “rolls” when running tests on the system. It also happens that the test bench is debugged, and then you should not use it for acceptance.
I wanted to be able to see the status of employment and, if necessary, manually lock the stand for the duration of the analysis of fallen tests or until the completion of other work.
For all this, the Locker service appeared. It stores the status of the test bench for a long time (“busy” / “free”), allows you to specify a comment on “busy”, so that it is clear that we are now debugging, re-creating a copy of the test environment or running tests for the next release. We also began to block stands for the night - on them administrators carry out work on a schedule, such as backups and database synchronization.
When blocking, the time is always set after which the lock expires - now people do not need to participate in returning stands to the available pool, and the machine does everything.
Duty
To evenly distribute the load among the team members to analyze the results of test runs, we came up with daily shifts. The attendant works with the tasks of acceptance testing of releases, parses fallen autotests and reports bugs. If the attendant understands that he is not coping with the flow of tasks, he may ask the team for help. At this time, the rest of the team members are engaged in tasks not related to releases.
With the increase in the number of releases, the role of the second attendant appeared, which connects to the main one if “blockages” arise or critical releases are in the queue. To provide information on the progress of testing releases, we created a page with the number of tasks in the “open” / “running” / “waiting for a response on duty” states, the status of the test bench blocking and the components that are not available on the stands:
The work of the duty officer requires concentration, so he has a bun - on the day of duty, he can choose a place for lunch for the whole team near the office. The bribes on duty in the style look especially fun: “let me help you sort out the tasks, and today we’ll go to my favorite place” =)
Reporter
One of the tasks that we encountered when we introduced the watch was the need to transfer knowledge from one officer to another, for example, about tests falling on a new release or the specifics of updating a component.
In addition, we have new features.
- There was a category of tests that fall with a greater or lesser frequency due to problems with test benches. Falls can occur due to the increased response time of one of the services or the long loading of resources in the browser. I don’t want to turn off the tests; reasonable means to increase their reliability have been exhausted.
- We had a second, experimental project with autotests, and the need arose to analyze the runs of two projects at once, looking at Allure reports.
- A test run can take up to 20 minutes, and you want to start analyzing the results immediately after the start of the first drops. Especially if the task is critical and the members of the team responsible for the release are standing behind you
, holding the knife to the throatwith pitiful eyes.
So the Reporter service appeared. In it we push the results of the test run in real time during the testing process. The service has a database of known problems or bugs that are linked to a specific test. A publication was also added on the company's wiki portal for a summary report on the results of a run from a reporter. This is convenient for managers who do not want to dive into the technical details that the Reporter or Allure interface abounds with.
If the test crashes, you can see in the Reporter a list of related bugs or fix tasks. Such information shortens the parsing time and facilitates the exchange of knowledge about problems between members of our team. Records of completed tasks are archived, but if necessary, you can “peep” them into a separate list. In order not to load internal services during business hours, we interview Jira at night and archive entries for issues with final status.
A bonus from the introduction of Reporter was the appearance of a run database, on the basis of which you can analyze the frequency of falls, rank the tests according to their level of stability or “usefulness” in terms of the number of bugs found.
Autorun
Next, we moved on to automating the launch of tests when the issue of acceptance testing of the release comes to the issue tracker. For this purpose, the Autorun service was written, which checks whether there are new acceptance tasks in Jira, and if so, it determines the name of the component and its version based on the contents of the task.
For the task, several steps are performed:
- take the lock of one of the free test benches in the Locker service,
- start the installation of the required component in Jenkins, wait for the component to be raised with the required version,
- run tests
- wait for the completion of the test run, in the process of their execution all the results are pushed into Reporter,
- we ask Reporter for the number of failed tests, excluding those that fell due to known problems,
- if 0 has fallen, we transfer the task for acceptance testing to “Finish” and finish working with it. Everything is ready =)
- if there are "red" tests - we translate the task into "Waiting" and go to Reporter to parse them.
Switching between stages is organized by the principle of a finite state machine. Each stage itself knows the conditions for the transition to the next. The results of the stage are stored in the task context, which is common for the stages of one task.
All this allows you to automatically transfer releases along the deployment pipeline, according to which 100 percent of the tests are green. But what about the instability caused not by problems in the component, but by the "natural" features of the UI tests or by the increased network delays in the test bench?
To do this, we have implemented a retry mechanism, which many people use, but few recognize this. Retrays are organized as a sequential run of tests in the Jenkins Pipeline.
After the run, we request a list of fallen tests from Reporter from Jenkins - and restart only failed ones. In addition, we reduce the number of threads at startup. If the number of dropped tests has not decreased compared to the previous run, we immediately end Job. In our case, this approach to restarting allows to increase the success of acceptance testing by about 2 times.
Quick block
The resulting acceptance testing system allowed us to conduct more than 60% of releases without human intervention. But what to do with the rest? If necessary, the attendant creates a bug report on the component under test or the task of fixing tests to the development team. Sometimes - draws up a test bench configuration bug to the operation department.
Tasks for correcting tests often block the correct passage of automatic testing, since irrelevant tests will always be “red”. The testers from the development teams are responsible for writing new tests and updating existing ones - making changes through pull requests to the project with automatic tests. These edits are subject to a mandatory review, which requires some time from the reviewer and from the author, and I want to temporarily block irrelevant tests until the task is translated to their final status.
First, we implemented a shutdown mechanism based on annotations of test methods. Subsequently, it turned out that due to the presence of a mandatory code review, blocking from the code is not always convenient and may take longer than we would like.
Therefore, we moved the list of tasks blocking tests to a new service with a web page - Quick-block. So members of the team responsible for the component can quickly block the test. Before the run, we go to this service and get a list of quarantined tests, which we translate into skipped status.
Summary
We have gone from the acceptance of releases in manual mode to an almost completely automatic process, which is able to conduct through acceptance testing of more than 50 releases per day. This helps the company reduce the time it takes to post changes, and our team can find resources for experimenting and developing testing tools.
In the future we plan to increase the reliability of the process, for example, by distributing requests between a pair of instances of each service from the list above. This will allow you to update tools without downtime and include new features only for part of the acceptance tests. In addition, we pay attention to stabilizing the tests themselves. In development, a ticket generator for refactoring tests with the lowest success rate.
Improving the reliability of tests will not only increase confidence in them, but also speed up testing of releases due to the lack of restarts of fallen scripts.