From pool request to release. Report Yandex.Taxi
In the release cycle of the service there is a critical period - from the moment when the new version is prepared, until the moment when it becomes available to users. The team’s actions between these two control points should be uniform from release to release and, if possible, automated. In his report, Sergey Pomazanov alberist described the processes that follow each pull request in Yandex.Taxi.
- Good evening! My name is Sergey, I am the head of the automation group in Yandex. Taxi. In short, the main task of our group is to minimize the time that developers spend on solving their problems. This includes everything from CI to development and testing.
What does our development do when the code is written?
To test the new functionality, we first check everything locally. For local testing, we have a large set of tests. If a new code appears, it should also be covered with tests.
Our test coverage is not as good as we would like, but we try to keep it at a sufficient level.
For testing, we use Google Test and a samopisny framework for pytest, with which we test not only the “python” part, but also the “plus” part. Our framework allows you to start services, upload data to the database before each test, update caches, mock all external requests, etc. A sufficiently functional framework that allows you to run anything you want, lock anything, so that we do not accidentally get any requests outside.
In addition to functional tests, we have integration tests. They allow you to solve another problem. If you are not sure that your service will properly interact with other services, then you can run the stand and run the test suite. While we have a set of tests base, but slowly expanding.
The stand is built on Docker technology and Docker Compose, where in each container their services rise, and they all interact with each other. It happens in an isolated environment. They have their own isolated network, their own database, their own data set. And the tests take place in such a way as if someone launches a mobile application, clicks on buttons, makes an order. At this point, the virtual machines go, take the passenger, then the passenger’s money is written off, and all that stuff. Basically, all tests require the interaction of multiple services and components at once.
Naturally, we test only our services and only our components, because we don’t have to test external services, and we mock all external ones.
The stand turned out to be comfortable enough so that it could be started locally and get a pocket taxi from it. You can take this stand, run it on a local machine or on a virtual machine or any other development machine. Having launched a stand, you can take a mobile application adapted for a pocket taxi, set it up on your computer and make orders. Everything is exactly the same as in production or elsewhere. If you need to check the new functionality, you can simply slip your code, it will pick up and run in the whole environment.
Again, you can simply take and run the desired service. To do this, you need to raise the database, fill it with the necessary content, or take a base from the existing environments and connect it to the service. And then you can simply refer to it, do some requests, see if it works correctly or not.
Another important point is the style check. If for “pluses” everything is simple, we use clang-format and check whether the code matches it or not, then for Python we use as many as four analyzers: Flake8, Pylint, Mypy and, it seems, autopep8.
We use these analyzers mainly in the standard package. If there is an opportunity to choose some kind of design, then we use Google style. The only thing we corrected by adding our own, is a check on sorting imports so that the imports are sorted correctly.
After you have created the code, checked locally, you can make a pull-request. Our pull requests are created in GitHub.
Creating a pull request in GitHub provides many of the features that TeamCity provides. TeamCity automatically runs all the above tests, checks them automatically, and in the pull request it writes about the status of passing, whether the tests passed or not. That is, without going to TeamCity, you can see if it’s passed or not, and clicking on the link, you can understand what went wrong and what needs to be corrected.
If you do not have enough pocket taxi and tests, you want to check the real interaction with some real service, we have a test environment that repeats production. We have two of these test environments. One is intended for mobile development of testers, and the second is for developers. The test environment is as close as possible to production, and if any requests are made to external services, they are also made from test environments. The only limitation is that the test environment goes to the testing of external resources whenever possible. A production environment goes to production.
More about the test environment, we have done quite simply through TeamCity. It is necessary to put the appropriate label in GitHub, and after it is set, click on the button "Collect custom". So he called us. Then all pool-requests with this label are smolzatsya, and further begins the automatic assembly of packages with casting in clusters.
In addition to routine testing, load testing is sometimes required. If you edit the code that is part of a high-load service, you can do load tests for this. In Python, there are few highly loaded services, some of which we rewrote in C ++, but nevertheless, they still remain, sometimes they have a place to be. Load testing occurs through the Lunapark system. It uses Yandex.Tank, it is freely available, you can download and watch it. The tank allows you to carry out firing at some service, build charts, do different ways of loading and show what load was currently on the service and what resources were used. It is enough to click on a button through TeamCity, the package will be assembled, and then it will be possible to roll it to the right place. Or just manually pour and run it there.
While you are testing your code, one of the developers can at this moment start watching your code, review it.
What we pay attention to in the process:
One of the important points is that the functionality should be disabled. This means that whatever the code, whether there were bugs in it or not, perhaps this functionality does not work as originally intended, perhaps the managers wanted another, or maybe this functionality is trying to put another service that was not ready to new loads, and we need the ability to quickly turn it off and put everything in a normal state.
We also have a rule that when rolling out new functionality should be turned off, and turned on only after it rolls out to all clusters and all data centers.
Do not forget that we have an API that is used by mobile applications that can not be updated for a long time. If we make any backward incompatible changes in our API, then some applications may break down, and we cannot make all applications simply upgrade. This will negatively affect our reputation. Therefore, all new functionality must be backward compatible. This applies not only to the external API, but also to the internal one, because you cannot roll out all the code at once to all data centers, all machines, all clusters. In any case, we will have both the old code and the new one at the same time. As a result, we will receive some requests that cannot be processed somewhere, and we will have errors.
You should also think about the following thing, that if all of a sudden your code does not work or you have written a new microservice, in which you have potential problems, you need to be prepared for the consequences and be able to degrade. My colleague will tell about this at the next presentation.
If you make a change in high-load services, and you don’t have to wait for the end of some operations, then you can do some things asynchronously somewhere in the background or as a separate process. It is better to do this because a separate process has less effect on production, and the system will work in general more stable.
It is also important that all the data that we receive from the outside, we should not trust them, we must somehow validate them, check them, etc. All the data that we have should be divided into groups that we formed or raw data that has not been validated. This includes all the data that could potentially be obtained from some other external services or directly from users, because potentially anything could come. Perhaps someone deliberately sent a malicious request, and everything should be checked with us.
There are still cases that when requested, the service may not respond at the right time. The connection may have broken or something went wrong, there may be many situations. The mobile application does not know what happened, it just does a re-request.
It is very important that in the process of these re-requests, no matter how many of them, in the end everything will work as originally expected with one request. We should not have any special special effects. At the same time, it is also necessary to take into account that we have more than one service, we have a lot of machines, a lot of data centers, we have a distributed database, and there is a possibility of racing for everyone. The code should be written so that if it runs in several places at the same time, so that we do not have races.
No less important point - the ability to diagnose problems. Problems are always, in everything, and you need to understand where they occurred. In an ideal situation, we learned about the existence of a problem not through a support service, but through monitoring. And when analyzing some situations, we could eventually understand what happened, just by reading the logs, without reading the code. Even the person who never saw the code so that he could get it from the logs.
And in the ideal case, if the situation is very complicated, you should be able to check the logs, which way the program went, and what happened to greatly simplify the debriefing. Because the situation has ended as a result in the past, and now it is unlikely to be able to reproduce, there is no data or other data, or other situations.
If you are doing new operations in the database or creating a new one, you need to take into account that there may be a lot of data. Perhaps you will write an infinite number of records in this database, and if you don’t think about archiving them, problems are possible, the database will simply start to grow indefinitely, and there will not be enough resources, no disks and sharding. It is important to be able to archive data, and store only the operational data that is currently needed. And it is also necessary to make queries on indexes in all databases. A non-index query can put all production. One small request to the most loaded central collection can put everything. We must be very careful.
We do not welcome premature optimizations. If someone tries to make some kind of factory using some very universal method that potentially handles cases for the future, maybe someday someone will want to expand it - this is not accepted here, because it is possible that it will develop it will be completely wrong, and perhaps this code will eventually be buried, and perhaps All this is not needed, but only complicates the reading and understanding of the code. Because reading and understanding code is very important. It is important that the code is very simple and easy.
If you add a new database in your code or make a change to the API, we have documentation, which is partially generated from the code, partially done on the wiki. This information is important to keep up to date. If not, it can be misleading or cause problems for other developers. Because the code writes one, and support it a lot.
An important part - the observance of the overall style. The main thing in this case is uniformity. When all the code is written uniformly, it is easy to understand, easy to read, and no need to delve into all the details and nuances. Uniformly written code allows you to speed up the entire development process potentially in the future.
Another point that we do not check for reviews deliberately - we are not looking for bugs. Because the author should be engaged in search of bugs. If there are bugs during the review, of course, they will write about it, but there should be no targeted search, it is entirely on the responsibility of the person who writes the code.
Further, when your code is written, the review is completed, you are ready to slow it down, but it often happens that you need to perform additional actions, migrating to the database.
To migrate, we write a script in Python that can communicate with the backend. The backend, in turn, has a connection to all of our databases and can perform all the necessary operations. The script is launched through the script launch admin panel, then it is executed, you can see its log and results. And if you need long-term bulk operations, you cannot update everything at once; you need to do this with chunks of 1000-10000 with some pauses in order not to accidentally put a base with these operations.
When the code is written, updated, tested, all the migrations are done, you can safely merge it into GitHub and proceed to release.
For some services we have a regulation according to which we have to roll out at a certain time, but a significant part of our services can roll out at any time.
This is all done with TeamCity.
It all starts with building packages. TeamCity makes git flow or similar. We are slowly moving away from the git flow to our experiences that we found more convenient. TeamCity produces all this, collects packages, floods them. We are waiting for further tests on these packages. Passing tests is mandatory for rolling out the release. If the tests are not passed, then you first need to figure out and see what eventually went wrong. Tests are used the same, usual and integration. They are checked already assembled package, ready, exactly what will go into production. This is just in case, suddenly there are problems in the assembled package, suddenly something is underdone, suddenly something is missing.
There is also a requirement that we create a release ticket in our tracker, where each developer must unsubscribe, as he tested this code, and it must contain all the tasks that must be performed.
All this is also done automatically in TeamCity, which follows the list of commits. We have a requirement that in each commit there must be the keyword “Relates” followed by the name of the task. A script written in Python automatically goes through this, compiles a list of tasks that have been solved, generates a list of authors and creates a release ticket, calling on all authors to unsubscribe about their testing and confirm that they are ready to “go” in the release.
When all are ready, the confirmations are collected, then rolling out occurs, first in the pre-stable. This is a small part of production. For each service, several data centers are used, in each data center there may be several machines. One of the machines is a pre-stable, and the rollout of the code occurs first only on one or a couple of machines.
When the code is rolled out, we follow the charts, the logs, and what happens on the service as a result. If everything is good, if the graphs show that everything is stable, and everyone checked that its functionality works as it should, then rolling it out for the rest of the environment, which we call stable. When rolling out into a stable everything is the same: we look at the graphics, logs and check that everything is fine with us.
Rolling out, everything is fine. And if something went wrong, if suddenly a problem?
We collect hotfix. It is done on the same principle as git flow, that is, a branch from the master branch. A separate pull request from the master is created, which makes corrections, and then the script launched from TeamCity supports it, does all the necessary operations, collects all the packages in the same way and rolls it further.
At the end, I would like to talk about the direction in which we are moving. We are moving towards a single repository, when many services live in one repository at once. Each of them has independent calculations: in testing, in releases. For pull requests, even when TeamCity is used, we check which files were affected and which services they belong to. According to the dependency graph, we determine which tests we ultimately need to run and what to check. We strive to maximize the isolation of services from each other. So far it is not very successful, but we strive for this so that multiple services can live in one repository, have some common code, and so that it doesn’t cause problems and simplify development life. That's all, thank you all.
- Good evening! My name is Sergey, I am the head of the automation group in Yandex. Taxi. In short, the main task of our group is to minimize the time that developers spend on solving their problems. This includes everything from CI to development and testing.
What does our development do when the code is written?
To test the new functionality, we first check everything locally. For local testing, we have a large set of tests. If a new code appears, it should also be covered with tests.
Our test coverage is not as good as we would like, but we try to keep it at a sufficient level.
For testing, we use Google Test and a samopisny framework for pytest, with which we test not only the “python” part, but also the “plus” part. Our framework allows you to start services, upload data to the database before each test, update caches, mock all external requests, etc. A sufficiently functional framework that allows you to run anything you want, lock anything, so that we do not accidentally get any requests outside.
In addition to functional tests, we have integration tests. They allow you to solve another problem. If you are not sure that your service will properly interact with other services, then you can run the stand and run the test suite. While we have a set of tests base, but slowly expanding.
The stand is built on Docker technology and Docker Compose, where in each container their services rise, and they all interact with each other. It happens in an isolated environment. They have their own isolated network, their own database, their own data set. And the tests take place in such a way as if someone launches a mobile application, clicks on buttons, makes an order. At this point, the virtual machines go, take the passenger, then the passenger’s money is written off, and all that stuff. Basically, all tests require the interaction of multiple services and components at once.
Naturally, we test only our services and only our components, because we don’t have to test external services, and we mock all external ones.
The stand turned out to be comfortable enough so that it could be started locally and get a pocket taxi from it. You can take this stand, run it on a local machine or on a virtual machine or any other development machine. Having launched a stand, you can take a mobile application adapted for a pocket taxi, set it up on your computer and make orders. Everything is exactly the same as in production or elsewhere. If you need to check the new functionality, you can simply slip your code, it will pick up and run in the whole environment.
Again, you can simply take and run the desired service. To do this, you need to raise the database, fill it with the necessary content, or take a base from the existing environments and connect it to the service. And then you can simply refer to it, do some requests, see if it works correctly or not.
Another important point is the style check. If for “pluses” everything is simple, we use clang-format and check whether the code matches it or not, then for Python we use as many as four analyzers: Flake8, Pylint, Mypy and, it seems, autopep8.
We use these analyzers mainly in the standard package. If there is an opportunity to choose some kind of design, then we use Google style. The only thing we corrected by adding our own, is a check on sorting imports so that the imports are sorted correctly.
After you have created the code, checked locally, you can make a pull-request. Our pull requests are created in GitHub.
Creating a pull request in GitHub provides many of the features that TeamCity provides. TeamCity automatically runs all the above tests, checks them automatically, and in the pull request it writes about the status of passing, whether the tests passed or not. That is, without going to TeamCity, you can see if it’s passed or not, and clicking on the link, you can understand what went wrong and what needs to be corrected.
If you do not have enough pocket taxi and tests, you want to check the real interaction with some real service, we have a test environment that repeats production. We have two of these test environments. One is intended for mobile development of testers, and the second is for developers. The test environment is as close as possible to production, and if any requests are made to external services, they are also made from test environments. The only limitation is that the test environment goes to the testing of external resources whenever possible. A production environment goes to production.
More about the test environment, we have done quite simply through TeamCity. It is necessary to put the appropriate label in GitHub, and after it is set, click on the button "Collect custom". So he called us. Then all pool-requests with this label are smolzatsya, and further begins the automatic assembly of packages with casting in clusters.
In addition to routine testing, load testing is sometimes required. If you edit the code that is part of a high-load service, you can do load tests for this. In Python, there are few highly loaded services, some of which we rewrote in C ++, but nevertheless, they still remain, sometimes they have a place to be. Load testing occurs through the Lunapark system. It uses Yandex.Tank, it is freely available, you can download and watch it. The tank allows you to carry out firing at some service, build charts, do different ways of loading and show what load was currently on the service and what resources were used. It is enough to click on a button through TeamCity, the package will be assembled, and then it will be possible to roll it to the right place. Or just manually pour and run it there.
While you are testing your code, one of the developers can at this moment start watching your code, review it.
What we pay attention to in the process:
One of the important points is that the functionality should be disabled. This means that whatever the code, whether there were bugs in it or not, perhaps this functionality does not work as originally intended, perhaps the managers wanted another, or maybe this functionality is trying to put another service that was not ready to new loads, and we need the ability to quickly turn it off and put everything in a normal state.
We also have a rule that when rolling out new functionality should be turned off, and turned on only after it rolls out to all clusters and all data centers.
Do not forget that we have an API that is used by mobile applications that can not be updated for a long time. If we make any backward incompatible changes in our API, then some applications may break down, and we cannot make all applications simply upgrade. This will negatively affect our reputation. Therefore, all new functionality must be backward compatible. This applies not only to the external API, but also to the internal one, because you cannot roll out all the code at once to all data centers, all machines, all clusters. In any case, we will have both the old code and the new one at the same time. As a result, we will receive some requests that cannot be processed somewhere, and we will have errors.
You should also think about the following thing, that if all of a sudden your code does not work or you have written a new microservice, in which you have potential problems, you need to be prepared for the consequences and be able to degrade. My colleague will tell about this at the next presentation.
If you make a change in high-load services, and you don’t have to wait for the end of some operations, then you can do some things asynchronously somewhere in the background or as a separate process. It is better to do this because a separate process has less effect on production, and the system will work in general more stable.
It is also important that all the data that we receive from the outside, we should not trust them, we must somehow validate them, check them, etc. All the data that we have should be divided into groups that we formed or raw data that has not been validated. This includes all the data that could potentially be obtained from some other external services or directly from users, because potentially anything could come. Perhaps someone deliberately sent a malicious request, and everything should be checked with us.
There are still cases that when requested, the service may not respond at the right time. The connection may have broken or something went wrong, there may be many situations. The mobile application does not know what happened, it just does a re-request.
It is very important that in the process of these re-requests, no matter how many of them, in the end everything will work as originally expected with one request. We should not have any special special effects. At the same time, it is also necessary to take into account that we have more than one service, we have a lot of machines, a lot of data centers, we have a distributed database, and there is a possibility of racing for everyone. The code should be written so that if it runs in several places at the same time, so that we do not have races.
No less important point - the ability to diagnose problems. Problems are always, in everything, and you need to understand where they occurred. In an ideal situation, we learned about the existence of a problem not through a support service, but through monitoring. And when analyzing some situations, we could eventually understand what happened, just by reading the logs, without reading the code. Even the person who never saw the code so that he could get it from the logs.
And in the ideal case, if the situation is very complicated, you should be able to check the logs, which way the program went, and what happened to greatly simplify the debriefing. Because the situation has ended as a result in the past, and now it is unlikely to be able to reproduce, there is no data or other data, or other situations.
If you are doing new operations in the database or creating a new one, you need to take into account that there may be a lot of data. Perhaps you will write an infinite number of records in this database, and if you don’t think about archiving them, problems are possible, the database will simply start to grow indefinitely, and there will not be enough resources, no disks and sharding. It is important to be able to archive data, and store only the operational data that is currently needed. And it is also necessary to make queries on indexes in all databases. A non-index query can put all production. One small request to the most loaded central collection can put everything. We must be very careful.
We do not welcome premature optimizations. If someone tries to make some kind of factory using some very universal method that potentially handles cases for the future, maybe someday someone will want to expand it - this is not accepted here, because it is possible that it will develop it will be completely wrong, and perhaps this code will eventually be buried, and perhaps All this is not needed, but only complicates the reading and understanding of the code. Because reading and understanding code is very important. It is important that the code is very simple and easy.
If you add a new database in your code or make a change to the API, we have documentation, which is partially generated from the code, partially done on the wiki. This information is important to keep up to date. If not, it can be misleading or cause problems for other developers. Because the code writes one, and support it a lot.
An important part - the observance of the overall style. The main thing in this case is uniformity. When all the code is written uniformly, it is easy to understand, easy to read, and no need to delve into all the details and nuances. Uniformly written code allows you to speed up the entire development process potentially in the future.
Another point that we do not check for reviews deliberately - we are not looking for bugs. Because the author should be engaged in search of bugs. If there are bugs during the review, of course, they will write about it, but there should be no targeted search, it is entirely on the responsibility of the person who writes the code.
Further, when your code is written, the review is completed, you are ready to slow it down, but it often happens that you need to perform additional actions, migrating to the database.
To migrate, we write a script in Python that can communicate with the backend. The backend, in turn, has a connection to all of our databases and can perform all the necessary operations. The script is launched through the script launch admin panel, then it is executed, you can see its log and results. And if you need long-term bulk operations, you cannot update everything at once; you need to do this with chunks of 1000-10000 with some pauses in order not to accidentally put a base with these operations.
When the code is written, updated, tested, all the migrations are done, you can safely merge it into GitHub and proceed to release.
For some services we have a regulation according to which we have to roll out at a certain time, but a significant part of our services can roll out at any time.
This is all done with TeamCity.
It all starts with building packages. TeamCity makes git flow or similar. We are slowly moving away from the git flow to our experiences that we found more convenient. TeamCity produces all this, collects packages, floods them. We are waiting for further tests on these packages. Passing tests is mandatory for rolling out the release. If the tests are not passed, then you first need to figure out and see what eventually went wrong. Tests are used the same, usual and integration. They are checked already assembled package, ready, exactly what will go into production. This is just in case, suddenly there are problems in the assembled package, suddenly something is underdone, suddenly something is missing.
There is also a requirement that we create a release ticket in our tracker, where each developer must unsubscribe, as he tested this code, and it must contain all the tasks that must be performed.
All this is also done automatically in TeamCity, which follows the list of commits. We have a requirement that in each commit there must be the keyword “Relates” followed by the name of the task. A script written in Python automatically goes through this, compiles a list of tasks that have been solved, generates a list of authors and creates a release ticket, calling on all authors to unsubscribe about their testing and confirm that they are ready to “go” in the release.
When all are ready, the confirmations are collected, then rolling out occurs, first in the pre-stable. This is a small part of production. For each service, several data centers are used, in each data center there may be several machines. One of the machines is a pre-stable, and the rollout of the code occurs first only on one or a couple of machines.
When the code is rolled out, we follow the charts, the logs, and what happens on the service as a result. If everything is good, if the graphs show that everything is stable, and everyone checked that its functionality works as it should, then rolling it out for the rest of the environment, which we call stable. When rolling out into a stable everything is the same: we look at the graphics, logs and check that everything is fine with us.
Rolling out, everything is fine. And if something went wrong, if suddenly a problem?
We collect hotfix. It is done on the same principle as git flow, that is, a branch from the master branch. A separate pull request from the master is created, which makes corrections, and then the script launched from TeamCity supports it, does all the necessary operations, collects all the packages in the same way and rolls it further.
At the end, I would like to talk about the direction in which we are moving. We are moving towards a single repository, when many services live in one repository at once. Each of them has independent calculations: in testing, in releases. For pull requests, even when TeamCity is used, we check which files were affected and which services they belong to. According to the dependency graph, we determine which tests we ultimately need to run and what to check. We strive to maximize the isolation of services from each other. So far it is not very successful, but we strive for this so that multiple services can live in one repository, have some common code, and so that it doesn’t cause problems and simplify development life. That's all, thank you all.