The development process and rolling out releases in Badoo. Automatic testing. Development environment
In July, together with leading IT-Kompot and Badoo release engineers Vladislav Chernov and Oleg Oyamäe, we recorded the release of the podcast “Development process and rolling out releases in Badoo. Automatic testing. Development environment. "
Since the past podcast aroused the interest of listeners and readers, we also turned this podcast into an article.
What they talked about:
The development and rolling out of releases at Badoo. Used tools.
- GIT Workflow. Each task in a separate branch;
- Using JIRA, TeamCity and AIDA;
- The formation of the release and rolling out two releases per day. Problems and their solutions (rollback, patches, etc.).
- What we use;
- How to drive tests;
- Code Coverage
- The starter. 18,000 tests in 3.5 minutes.
AND recommendations from the guys: useful books, articles, etc.
Anton Sergeyev: Hello everyone, you are listening to the 48th issue of the IT Compot podcast, and Anton Kopylov, its host, is with you.
Anton Kopylov: And Anton Sergeyev, hello.
Anton Sergeev: Today our guests are brave guys - Badoo release engineers Oleg Oyamäe and Vladislav Chernov. Hi guys!
Vladislav Chernov: Hello.
Oleg Oyamäe: Hello.
Anton Sergeev:We get a cycle of podcasts with Badoo. Today we decided to talk in more detail about the area, which is, if not the pride of Badoo, then certainly a very important achievement and a clear success. This is what the company is doing successfully rolling out new releases using fully automated tools. Plus, the company has very interestingly made and configured the testing process. It allows you to effectively carry out a large number of tests. Everyone who listened to the last issue of the podcast already knows about 2 releases per day - Alexey Rybak spoke about this.
Well, let's get started. Vlad and Oleg, I suggest you tell us briefly about yourself and what you do at Badoo.
Vladislav Chernov:Hello again. Oleg and I are engaged in configuration management and release engineering at Badoo. Speaking about myself, I’ve been engaged in release engineering for most of my working career: I started as a regular release engineer, did a lot with my hands, we rolled out simple releases, and then I went into automation and more and more automated, turning this process. Now I am engaged in automation of the entire business process of development and testing in Badoo.
Anton Sergeev: I see, cool. And Oleg is your colleague and helps you with this, right?
Oleg Oyamäe:Actually, I have a slightly different story. I was a developer and team leader in other companies, and at Badoo I decided to try myself in a new field for me - in release engineering. But I do more programming, not automation. Let me tell you a little about how everything was arranged in Badoo from the beginning to the present day.
Anton Sergeev: Yes, let's talk about the history of your development process, how you started to do all this, because it was obviously not immediately possible to get to such a state as now. Probably, there were all sorts of obstacles and tasks that had to be addressed. Tell us how it all began, why it turned out the way it is now, and how you came to this.
Oleg Oyamäe:For a long time, Badoo used SVN as a version control system. That is, for most of Badoo's existence, SVN was used, and it is still used in some parts. (Note: in the distant, distant years, of course, there was CVS).
Anton Sergeev: Now some listeners are probably starting to “write in boiling water”: how is it, subversion! But, as far as I know, SVN, it is more peculiar to projects like enterprise any. You had a lot of experience working with him since Mamba, that is, it historically happened, right?
Oleg Oyamäe: The transition to Git began two and a half years ago: at that time some kind of boom started and everyone started to switch to it. But at Badoo, it was a planned process, as part of the implementation of testing in the company.
Anton Kopylov: Have you not considered Mercurial as an alternative to Git?
Vladislav Chernov: As far as I know, we did not consider an alternative to Mercurial, because at that time the number of plug-ins, software for managing repositories, and so on was more on Git. Plus, the community was already formed, but on Mercurial it was just starting. For me, for example, a special plus is that Git uses any language, but with Mercurial it’s more difficult. (Note from test director Ilya Ageev: We tried to introduce Mercurial many years ago, but it didn’t take root with us and we returned back to SVN).
Anton Kopylov: Yeah, good.
Oleg Oyamäe:Here it is. Accordingly, two and a half years ago, the process of introducing quality control began. A department has been created that is still growing and developing. Just at that time, a new workflow was developed in which each task was done in a separate branch, but everything was still assembled by hand, there was neither automation, nor, by and large, nothing. The calculation took place, one might say, completely by hand. There was a utility that was written a very, very long time, everything was laid out through it, and there was no monitoring. Over time, a lot has been written around this utility - dashboards and other things, it has become a kind of core. That is, in short.
Anton Kopylov: And the process of laying out and testing, did you have it put together or were these different processes?
Oleg Oyamäe:Previously, these were not completely interconnected processes. Now it’s full-fledged continuous integration, when tests are run after each commit, and this directly affects the calculation and staging. These two processes are now interconnected and affect each other.
Anton Sergeev: By the way, guys, I wanted to ask your opinion. It is clear that many projects at startup most often use only a version control system and, say, some kind of hook to control this. Usually, no one immediately puts powerful tools for continuous integration, for deployment. What do you think can be a transitional point, how much should the project grow, so that it becomes clear that it is impossible to do without normal continuous integration and without tests?
Oleg Oyamäe:My personal opinion is that on any project from the very beginning you need to think about integrating these things, and even more so about writing tests. And the tests are probably even more primary. Especially if the project is written from scratch - in this situation it is worth thinking about test coverage at the very early stages of development.
Anton Sergeev: That is, if you were to do Badoo again now, you would immediately take such a system and the very first commit would already get through continuous integration and pass all the tests?
Oleg Oyamäe: Well, I would - yes.
Vladislav Chernov:I would say a little differently. If we talk about the “pop-up” word start-up, then it’s clear that no one will make such a complex system, because it is not known whether the start-up will take off, and resources are spent on tests rather big, plus we always have limited deadlines, for which we must bring the product to the market. Accordingly, talking about automatic testing from the beginning of the project is probably not necessary. But you need to talk and think about automatic assemblies and, perhaps, some kind of “versioning” and automatic deployment from the very beginning, because it will not take a lot of time from the project team and these are small resources compared to the same automatic tests.
Anton Sergeev:Clear. And let us tell you a little more about how you get 2 releases a day, about several stages of testing, the nth number of tests and how it works. About kickbacks, patches, hotfixes.
Vladislav Chernov: Let me first go back to history a little: 2 releases a day - it really is, we are leaving about 50-60 tasks a day. Everyone will probably be interested to know why exactly 2 releases. Everything is very simple. It is clear that at first there was one release per week or one release in a few days, then there was a release every day, and then there were two releases per day. Why two?
At some point, we switched to flow, where each task is done in a separate branch, and, accordingly, when we form the release, we merge these tasks into the release branch, re-check each of the tasks, and plus we check the code coherence (integration testing) . When it is from 10 to 30-40 tasks maximum - it is quite easy to do. When you need to rake 100 tasks in this release, then it is already much more difficult to do. And so we deploy twice a day, we can deploy more often, but that doesn’t make much sense. Why each task in a separate branch is probably also clear.We have several stages of testing, there are about 5 of them. Each task is in a separate branch, because we can roll it back and check it. It turns out that we have code with production plus this task. And we test it at several stages of testing. The first stage is standard - this is a code review, and only this task looks. The task done is checked on the development environment. The development environment is our mini-server environment with virtual machines, with databases, and so on. On it you can check things that cannot be verified in production. Tasks are tested there, then we create mini-staging for each task, it is called shot, the base is used from production, and we check the task again. Then the task leaves for release and is checked for code contingency already in the release branch, does it break other tasks. And so on and so forth. There the task is checked for the fourth time. And there are optional tests for post-production, when we test this task for some request from the product manager, or this is a very important task and testers re-check it for production, but these are optional things.
Anton Sergeev: How is your code review going? Are some special people doing it, or do developers randomly review code from other developers?
Vladislav Chernov: In each department, everything goes differently. When a task falls under review, a developer or development team is selected. In some groups, review is done by component team leaders. In others, tasks simply rotate within the group. It depends on the size of the department, and on the experience of people, and on traditions.
Anton Sergeev:Guys, let me ask the questions that the listeners sent. For example, Stanislav sent us many different questions. I broke them into several groups. For example, he asks who makes the final decision on the release rollout - a soulless machine or some analogue of Sergei Didyk.
Oleg Oyamäe: No, we don’t have Sergey Didyk, he is in another company (note: we are talking about the company Runner). We release twice a day at a certain time in the morning and in the evening and, accordingly, we collect the release from the tasks until this time, then the auto-merge stops and we begin to test these tasks. When the tasks are tested and the moment of rolling out the release comes, the release engineer makes a decision and we leave for production.
Anton Sergeev:What are you rolling out? And how do you roll it back if it doesn’t roll out? For what period of time can you roll back, if that?
Vladislav Chernov: For rolling out, we have our own deployment tool, it was written by our developer Yuri Nasretdinov. If you want to know more about this, then Yuri has several reports and an article about this. The utility very quickly rolls out a thousand servers, literally in 2-3 minutes. This is our deployment system, and we are constantly improving it.
(Note: rollback is the same rollback, only the version number is different. In this way, we can roll out and roll back in a few minutes. Roll back even faster, because the previous version remains on the servers. You just need to throw the link, and this is a few ssh commands to switch and flush the cache).
Anton Sergeev: Is it somewhere on GitHub, can it be felt by ordinary people?
Vladislav Chernov: No, she is not, and why - this is probably a question for Yuri Nasretdinov.
Anton Sergeev: Clearly, we hope that lay out. It would be interesting to touch, of course.
Vladislav Chernov: Yes.
Anton Kopylov: And I have another question, you say that you have a testing stage, when the code is checked at the stage with production base. And how do you transfer data from production to staging? If a problem is detected, what are you doing with it?
Oleg Oyamäe: What do you mean by this problem?
Anton Kopylov:Can I check on staging, for example, not on one hundred million users, but on one million? Or do you transfer the entire entire database and fully verify this code?
Oleg Oyamäe: No, it does not use a copy, but a full-fledged real production.
Anton Kopylov: Ahh.
Oleg Oyamäe: And only the code with some new changes, and the base is used precisely from production.
Anton Kopylov: Yeah. Isn’t it scary to connect to a production base at staging?
Oleg Oyamäe: No, because testing takes place on test users, that is, even if something happens there, it doesn’t “affect” and will not disturb real users (plus the task is tested twice before).
Anton Sergeev:By the way, we have a similar approach used by the company and we are trying to get away from it, but at the same time we are thinking about whether to leave it. Here the main common sense is the main thing: if you really do nothing terrible, if you don’t drop collections, for example, then everything will probably be fine.
And Stas asks if you use rollout to part of the cluster. Are users nailed there or can it be any part of the cluster? I understand that this is about whether it can be rolled out to a part of the cluster, tested and then rolled out for everyone.
Vladislav Chernov:I can answer this question this way: we can roll out to any machine and to any cluster. We use about 10 clusters. And we can roll out this code with the utility to any part of the cluster. Regarding the interest of rolling out and testing somewhere, we basically roll out for testing a new feature and we want to look at the results. We are rolling out a separate country, not some set of cars, because it is much more convenient.
Anton Sergeev:Clear. And how do you roll back? Suppose you had some feature in the release branch and something went wrong during testing on staging. For example, you find that there is some serious problem and it is advisable to roll it back for now. That is, tell us how you work with the version control system (Git), whether you use git rebase or git revert. And if so, tell me how.
Vladislav Chernov:We use git rebase, we use it because git revert does not suit us, since we develop each task in a separate branch. If we roll back with revert after merging the release branch and the master branch, then the developer will have to revert to revert, so we use git rebase. We instantly roll back, respectively, collect a new assembly and give all our best to staging.
Anton Sergeev: Have you had any problems with git rebase? As far as I know, this thing is rather capricious, you need to be able to use it correctly. What is your recipe?
Vladislav Chernov:In fact, it’s not so capricious and the recipe is very simple: we merge other branches into the release branch, and we have a release branch tree that is very simple, and it is very easy to roll back the task, since this is a contingent commit. And we understand that git rebase is a completely manual operation. But we have an algorithm and a small script that executes this in an automatic, well, almost automatic mode.
Anton Sergeev: We already know that you use Git, but in addition you also use such a wonderful tool from JetBrains as TeamCity - it’s a continuous integration server, and you have it all integrated with the JIRA bug tracker, as I understand it - and task setting, and automatic status changes. How did all this come about and integrate with you, and how do you work with it?
Oleg Oyamäe:Yes, as already mentioned, we use Git, as well as JIRA and TeamCity. We use TeamCity to run tests and lay out the code for staging for a successful run. In fact, the flow in JIRA is well developed and structured, and all the components of the JIRA, Git and TeamCity systems are integrated. We have a very large workflow that we regularly optimize to make it convenient for developers, testers, and product managers.
Anton Sergeev:By the way, at the last DevConf conference there was a report about problems in architecture, and the speaker said that JIRA should not be used in any case, because you can’t give out rights to different users at the task viewing level so that, for example, developers do not see any there is an additional meta-info that the project manager needs, and the project manager did not see any details that can only be useful to developers. How critical is this, in your opinion, and what problems did you encounter with JIRA? Were there any really serious things that you just finished?
Vladislav Chernov:This problem is with the project manager and the developer, it is of some very closed plan, we are actually all friends and we do not have hidden information, and there is no such problem with the fields, for example, whether to hide them or not. Regarding whether or not to use JIRA as a bug tracker, everyone chooses it himself, but this is at least the most common system in the world, namely as a bug tracker. In some ways, it suits us, in some ways - no. Where it does not suit us, we automate the work with it. In fact, we have a complicated workflow, but in JIRA there are two levels of nesting of task and subtask, and because of this we get more pluses than minuses, because cannot complicate the process.
Anton Sergeev:Yeah. Do you use confluence? If our listeners don’t know, I’ll just say right now that confluence is a wiki where you can store documentation and more.
Vladislav Chernov: Yes, of course, we use it. And I'm afraid to make a mistake, but as far as I know, we even use GreenHopper. But we don’t use FishEye to view the code, because it is a very cumbersome tool, and we use GitPHP for review of the code, which, again, we finished it ourselves, and it is fast enough compared to the same FishEye, which indexes a huge number of our branches days.
Anton Sergeev: What is your impression of working with TeamCity? Have you used anything else, have you tried Jenkins, for example?
Vladislav Chernov:Yes, I have worked with many continuous integration servers: Jenkins, TeamCity, and Bamboo. But TeamCity suits us more than the rest of the continuous integration servers currently on the market.
Anton Sergeev: Can you tell us in more detail what this is connected with? With your current process, with rolling out releases or there are some features that other developers could also take into account. Let's say now the developer is listening to us and thinks what to choose for him. Are there any important factors that you should choose TeamCity for?
Vladislav Chernov:One of the advantages of TeamCity is that its free functionality is enough for small companies to use. On the other hand, it has paid features, it supports paid support and is made with high quality. Everyone says that Jenkins is good because there are so many plugins for it. This is true, but it turns out that if you rely on some kind of plugin, and then a new version of Jenkins comes out, it happens that the plugin does not correspond, and you encounter problems, and you need to rebuild everything again.
TeamCity bribed us at least by the fact that it is quite simple to use and convenient not only to release engineers, but also to testers and developers. Plus, at that moment they had certain features that fit us perfectly. This, for example, catching branches by mask, and we have a huge number of branches, and we do not rebuild the builds. Then these features were added to other continuous integration servers, but this was later. Well, of course, support perfectly supports and helps us, they modify the product and finish some things for us.
Anton Kopylov:I also wanted to add that in TeamCity, one of the features that seems to me killer features is configuration inheritance. You can make a configuration template and then create some clones from it. This greatly simplifies the process of adding subprojects to the system.
Vladislav Chernov: Yes. It’s like if, for example, you use one repository and you don’t have custom configurations, and you can configure some things as a template and either pass parameters or something else. This is also an interesting feature.
Anton Kopylov: This is just what competitors don’t have - inheritance of the configuration, such a TeamCity feature.
Anton Sergeev: And I still have a question about Git. Everyone knows, or at least heard of the so-called git-flow. This is such a standard approach to using Git. This approach has been developed for a long time, and this is a successful model of brunching. Your flow is based on this standard approach, which uses feature branch, release branch, hot-fix branch, master, developer branch, or you have your own chips, some good recipes - how to create branches better, how to merge branches better - what can you talk about it?
Vladislav Chernov:Oh sure. In fact, we are far from git-flow. Everything is simple with us: there is a wizard that contains a copy of the production code, no one has the right to push there, patches can be applied there, we have a special tool for this, it is called Deploy Dashboard. The release engineer looks at this patch: if everything is fine, he uses it semi-automatically in the web interface, and, accordingly, the code decomposes into production and something fixes there. It happens quite rarely, but it happens, of course. And we have a release branch, which is created twice a day in fully automatic mode, plus there are task branches. In the branch, tasks are numbered according to JIRA tickets: ticket number and some description of the branch. And now the ticket is created, this branch along with the ticket goes through flow, it is developed, goes for review, checked in several stages of testing, Then it automatically merges into the release branch, and here the continuous integration server that collects this build is turned on. And although we have PHP code, it really is being built with us. For example, translations are chased, which are automatically generated in more than 40 languages and, accordingly, are rolled out to staging and to the environment, and all this is done automatically. The release ends, it merges in the master and a new release is created automatically.
Anton Kopylov: My question arose: do you keep any numbering? If you have 2 releases per day, then you probably have a lot of releases already accumulated, or is this number not so fast growing?
Oleg Oyamäe: Not really. Yes, TeamCity has an internal increment, but it is more used to work directly with TeamCity, and the release names themselves have a more “human-readable” format, that is, the date, time and project are present there, because we upload not only for the web - with recently both for iOS, and for other projects. So we don’t have such a problem as, for example, the “release 365”, and no one understands what it is and where it comes from.
Anton Sergeev:Yes, there is such a problem. I came across a situation where you have a task, for example, in Redmine, and you write that the feature is so-and-so, it is so-and-so. Then you look at the release, and there are a bunch of features. You say, “Well, so-and-so, which one is mine?” And you have to climb into the browser to watch there.
I believe that despite the fact that each of the tools that you use, be it JIRA, TeamCity or Git, is good in itself, but they show themselves best and most effectively, as we see, in your example, in integration . And for this you use a tool like AIDA, right? Tell us how you do it, how you came to this and what profits you get?
Oleg Oyamäe:Yes, we use AIDA. This is such a virtual user on behalf of whom all automatic actions take place, some scripts are launched, notifications are sent to Jabber, to the mail, information is updated in the dashboard, that is, the whole process is automated, all information is available both in TeamCity and in JIRA. We also have a certain format for commit message, when we look at Git log, everything is also visible and understandable. Also in JIRA there are links to GitPHP in a specific task: you can always see both the full diff and all the commits separately. So yes, probably, the use of these three tools together is really convenient and justified.
Anton Sergeev: How difficult is it to start working with AIDA, or is it a very simple tool, not some rocket science?
Oleg Oyamäe:Let's just say: there are simple parts, but there are also quite ornate.
Anton Sergeev: What were the most serious problems you encountered when using AIDA and integrating with TeamCity, JIRA, Git?
Oleg Oyamäe:Well, look, in fact, AIDA in many cases uses the JIRA API, TeamCity, it works with Git as a regular user. The first part we came across is, of course, the API. It changes from version to version, and it cannot be said that it is always stable. And this was the first situation when we encountered difficulties. Regarding the implementation of AIDA, the question is how to properly configure the business process: for example, at what point in the release thread, at what point in automerge, and so on. It’s correct to build notifications, because people must understand how this process is going, and if we miss something, then AIDA will work without us, and this will be an uncontrolled process. Building this process is the biggest difficulty.
Anton Sergeev:It turns out that this is all debugged, written, driven, looked and so on until more or less it enters the normal working rhythm, right?
Oleg Oyamäe: Yes. We have a special environment in which we write and test AIDA, so in most cases, some improvements and changes go smoothly and nothing breaks. In this regard, all is well.
Anton Sergeev: I see. Are there any plans for the future, maybe some interesting technologies in terms of rolling out, deploying, automation that you are looking at, or that you see on the market right now? Or are you still more than happy with your tools and would not like to change anything?
Oleg Oyamäe: Yes, as long as everything suits us, we are finalizing AIDA and everything is fine.
Anton Sergeev:Super. I suggest then move on to the next big topic. Let's talk about testing and how you do automated testing, what you use for this, how it all happens and what indicators you managed to achieve?
Vladislav Chernov:Yes, we use automated testing. If you start from the very beginning, it is worth saying that the developers write unit tests. At this point in time, a task is not accepted for release unless it is covered by unit tests. Accordingly, we use Selenium for PHP code, we consider code coverage, this is fairly standard. At some point, we had a rather big problem due to two releases per day: this is a limited time, it is literally several hours for testing, and now, accordingly, we do not have a consistent process, we are doing everything in parallel. For example, there is a push to the release branch, the assembly starts automatically, unit tests and selenium tests start automatically - in parallel, and, optionally, some separate tests that are very important. And we faced such a problem,
Our wonderful colleague, Ilya Kudinov, wrote a utility that we call “Startup”, which divides these tests into 11 threads, while it completely shapes them in time, so that each thread passes at regular intervals, and now we have tests chasing somewhere then about 3.5 minutes is 18 thousand units. And tests are not only chased on the release branch - they are also chased after the developer has completed his task in the branch. Also in automatic mode. There is a report written in JIRA, and the developer or tester can immediately see the result of running these tests: what fell, what did not fall, and so on.
Anton Sergeev: Yeah. And about Selenium, there is again a question from Stanislav, is there a big farm under Selenium?
Vladislav Chernov:I could be wrong, but as far as I know, the farm is big enough. This is due to the fact that we support a fairly large number of browsers and we just recently abandoned Explorer 6, that is, we are testing on almost everything: Chrome, Firefox, and Explorer. And the farm itself is quite large. We also “parallelize” the tests, run them in parallel because they take a rather long amount of time. We are trying to use browsers and engines without a GUI, but I can’t boast and say that this is very successful.
Anton Sergeev:Yeah. By the way, I remember you know that: I was at the Mail.ru Technology Forum, and there was a report from either Yandex or Mail.ru (honestly, I forgot) about how they use Selenium. And there was some sense in that they first launched a separate node for each browser - as I understand it, this is some kind of virtual machine such as Xen, and it all turns out quite “resource-intensive” and not very well, and Selenium beat hands on that it is impossible to run tests on one node at the same time for different browsers, because it will break, there will be glitches, in general, there will be problems. And they climbed into the woods, sawed something there, understood some features of the same Explorer when I, I don’t know, forced to return the focus, to do something else so that all the tests would not fall in parallel. And they talked about it and said it’s generally mega cool, that we were able to and made the whole system work so that everything began to pass quickly. Have you ever dabbled with such things, faced with this? And how do you look at the detailed “crutches”?
Vladislav Chernov: I don’t know what actually happens in browsers, we still do not write selenium tests and do not know such details. But in my opinion, there are browsers that are updated quite often, and then these crutches must be transferred from one browser to another at least. Regarding dropped tests: selenium tests are tests that still fail, and the most common way to fix this is to run these tests again and see if they fall or not. As for who does what, for example, about individual virtual machines, we do not use this, and for each individual browser it is too resource-consuming to take your own machine. But, interestingly, Google, for example, drives selenium tests on its data center.
Anton Sergeev: Wow.
Vladislav Chernov:Yes, while cars are a production site, a Komovsky site. He takes the same machines that users use, and drives selenium tests there, so everyone has different resources. Nevertheless, we try to do everything less resource-intensive.
Anton Sergeev: We still have questions from the audience. What types of tests are you writing?
Vladislav Chernov:Look, we at least have unit tests and, of course, functional tests. That is, we climb into the base, we look at the base there on devel. We have Selenium, which runs on devel, on both sites. We have two data centers, and the devel infrastructure completely repeats the infrastructure of the site. We have two sites here, and the replication of the bases on devel is the same as on the sites. That is, everything that is on production is now in the development environment. But we drive selenium tests on the development site, and on staging, and at some critical moments on production. For PHP, this is probably all.
Anton Sergeev: Then the question is: unit tests are completely unit? Without a DB? If so, how is a bunch of tricky SQL queries tested with a couple of joins and what it handles?
Oleg Oyamäe:We try to write unit tests as unit tests, and tricky requests - for this there are functional tests, which are just what we need. Unit should be Unit'om and check only one specific part.
Anton Sergeev: There is still a question from Stas (he asked a lot of questions, and I can’t resist - I must answer the person, if he is interested). Do you use A / B testing and how do you test performance?
Oleg Oyamäe: Yes, we use A / B tests. True, this is done by means of another department, but Badoo uses A / B testing to understand which version of the text design or, for example, the button is better and more pleasant for the user, so that the service becomes even more convenient and understandable. There are also special automation tools for this.
Anton Sergeev:And how do you test performance? As I understand it, what is meant here is not the performance of the tests themselves, because you said that you have 18 thousand tests in 3.5 minutes, right?
Oleg Oyamäe: Yes. That's right.
Anton Sergeev: And performance, I don’t know, let's say, response to production, to some specific request.
Oleg Oyamäe: Not sure if I understood the question correctly. But we have a monitoring department, which continuously ―24 hours a day, 7 days a week, and so on - monitors the state of the entire system, all, absolutely all parts, as well as the response time of the pages, how quickly the static is returned. If something went wrong, the problem immediately escalates to the responsible person and the problem is localized and corrected as soon as possible.
Anton Sergeev:Let's talk about code coverage: what are your requirements for code coverage, how do you work with it, how much do you cover. As I understand it, the most important thing is that the code without coverage does not get into the release. But then how do you monitor it, how much do you try to write tests?
Oleg Oyamäe:All new functionality should be covered by tests, that is, when a developer submits some task for testing, he must cover all methods, new or changed, with unit tests. Accordingly, then the task goes to QA, QA writes selenium tests, functional tests, and then coverage is considered once a day, and then department heads, testers and all interested parties in a special web interface can see which group has good coverage, and to draw appropriate conclusions, to set some tasks for the completion of unit-tests, or the testing department will set the task of completing the Selenium'a. All this is an ongoing process of improving code and coverage.
Anton Sergeev:And then let's talk about your “launcher”, which you just mentioned in passing. I’m interested in learning more about how much your productivity has improved, how much it was, how much it has become, and in more detail tell us about the architectural and philosophical approaches that were used to develop the “launcher”?
Vladislav Chernov:We examined several schemes there, even the simplest ones, for example, dividing some suites into the nth number of tests and running them, and various other options. They are also all described, and you can read about them. As for the profit and how it was implemented, I will tell you in more detail now. We got a profit, because, if I'm not mistaken, the tests were chasing somewhere around 20 minutes, maybe even more. But now it is 3.5 minutes, that is, we got a huge gain in time. Regarding the implementation, we divide the launch into 11 threads, while suites are formed so that they take time from a specific database, which is formed on the basis of the TeamCity server database. We know how much each test is chasing, we collect this information, if I'm not mistaken, for 7 days, we take some average value and, accordingly, we know how much this test chases on average. Based on this information, these statistics are collected and 11 identical flows are generated and launch is ongoing. The flows are completely balanced, there is even a result, how many tests will go, such a look into the future.
Anton Sergeev: I see. Are there any thoughts on how this can be improved, where can we move on? Maybe in the future you will post it in open-source so that others can use it and enjoy it?
Vladislav Chernov: Yes, Ilya Kudinov promised that in the next few months we will be able to do this. It’s just that now it’s a little customized for us and, accordingly, we need to arrange it just for Jenkins, for example, and put it in open-source. I think this will be done, and everyone will be able to try and play around with it.
Anton Sergeev:Здорово. Вы, конечно, молодцы, потому что и последнее время много рассказываете на конференциях о том, как вы вообще разрабатываете, статьи на Хабре пишете и, опять же, вот к нам в подкаст с удовольствием приходите. Это очень интересно, и думаю, что многим разработчикам интересно послушать про ваши инструменты. Потому что есть технологии, которые сейчас крупные компании и крупные игроки на рынке выкладывают, и вот они многие, честно говоря, какие-то спорные, а многие технологии даже никто и не использует, о них только говорят на конференциях. Я не буду конкретно говорить, но вот у меня есть такое мнение по поводу определённых вещей, систем, что они слишком уж кастомные и не очень интересные для комьюнити. В вашем случае, я думаю, что это совсем не так, поэтому желаю удачи в том, чтобы вы доработали и выложили эти инструменты на GitHub либо ещё куда-то. У нас есть немножко времени напоследок, давайте поговорим про девелоперское окружение. Вы говорите, что у вас есть полная копия production непосредственно в офисе, где сидят разработчики. Как вы подходите к вопросу девелоперского окружения, есть ли у каждого разработчика свой какой-то sandbox, что собой представляет эта ваша копия production, какая у вас здесь структура?
Anton Kopylov: Well, yes, and are there any means used to repeat the environment, such as Vagrant or Puppet?
Oleg Oyamäe: Yes. We have two main data centers in production, in Prague and in Miami. Accordingly, we need to recreate two platforms on devel'e, so that we can test some cross-platform things. Our base is an absolute copy, the structure is absolutely the same. But nevertheless, the whole configuration is completely the same, thereby allowing us to develop code without any problems, test it, all services - everything is up, and it’s quite comfortable for developers to develop.
Anton Sergeev: Is this a single configuration, that is, is Puppet or Chef used there?
Oleg Oyamäe:Yes, Puppet is used for production and for everything, that is, all things are laid out through it, all configs. Due to this, we have all configs configs and there are no problems with the fact that one config is on one machine and the other on the other.
Anton Sergeev:By the way, a cool approach. But here we, for example, also use Puppet, but we have created a separate config for development. Why? Because we have a slightly different environment from development. For example, we put MongoDB on the same wheelbarrow, in general, where we have access to the API or to the web, to the frontend. And with us they are slightly different, these configs, but in general, we are also happy because we are updating them quickly. If we update, say, PHP, or something else, or the Mongo driver, it takes a little time. But with the machines of developers, how are you? By the way, you have historically more developers on Windows, or on Macs, or on Linux, how do you do development and first testing on local machines?
Oleg Oyamäe:In fact, development is carried out on DEV servers in most cases. Some developers raise local nginx or something else, but probably for some very highly specialized and small tasks. But it is impossible to raise the whole environment on a local machine, and it is simply not even necessary. That is, developers develop themselves in their home directory on devel'e. Accordingly, each developer has his own virtual host, which maps into his home directory in a special folder. As for which operating systems are more, I’ve seen in the office, in my opinion, only one developer on Windows, maybe two, but mainly Macs and Linux, but I don’t know the percentage I think 50 to 50.
Anton Sergeev:So that's interesting. But in your case, it’s unprincipled what the developer is sitting on, because he still does everything somewhere on the devel, the only difference will be that he may have some other means, a GUI for accessing the database, depending on platforms. And as far as I know, many normal developers are now increasingly resorting to native tools. They enter the database from the console, so there are no problems either. Why did I still ask because there is a practice when the development environment rises, but, for example, under the same Vagrant. First, the developer writes and saves his code to this virtual machine with Vagrant; there, say, some Linux server is spinning, and then it is already poured onto a DEV server. But as I understand it, in your case this extra link is not there, you immediately develop and test on devel,
Vladislav Chernov: Yes, it is, because, in fact, our server room is just a set of virtual machines. And the only place we use cloud technology now. We use virtual servers, but this is a devel environment, and we do not use virtual machines for any specific tasks, the only thing is clear where we need virtual machines, a certain axis, or something else - this, of course, is for testing and for different browsers, then we raise virtual machines just for such specific tasks. And since we really have enough of our devel, the more it supports the huge amount of functionality that we have on production. This is much more convenient than raising a virtual machine for a specific task.
Anton Sergeev:It's great that you still figured it out that you support normal DEV, because, as far as I know, in medium-sized companies this is a constant headache, especially that it is being tested on a different platform on which it will work. Versions can differ, and even sometimes the difference is small in versions, and because of this, everything can break down on production, and this is very important, because a lot of time is spent on understanding and fixing. And if this happens all the time, then this is the wrong testing procedure and the wrong environment.
And before the very end of our dialogue today, you will give some recommendations on your own, what to read in terms of books, articles, which conferences you can attend if you are interested in testing, automation, thereby deploying or creating distributed large systems similar to yours.
Vladislav Chernov: Well, at least subscribe to our blog on Habré, and maybe we will tell you something else interesting there. We have many interesting articles there. If we are not talking about continuous integration, then you should still read Bergman, at least he gives some initial knowledge and basic concepts, and these books:
- Jez Humble & David Farley "Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation" ;
- Paul M. Duvall, Steve Matyas & Andrew Glover "Continuous Integration: Improving Software Quality and Reducing Risk" ;
- Alan Berg, Jenkins Continuous Integration Cookbook
They are quite interesting. If we are talking about large systems, to be honest, few people are happy to talk about business processes, about configuration management, about release engineering, and this is very sad.
Anton Sergeev:Probably, just every project, it is unique, there are some of its chips, its know-how, but many companies, indeed, having a highly loaded project, are becoming more and more closed. Here you can cite “VKontakte” as an example, because they give very little information out, and their appearance at conferences is always perceived “with a bang”, but nevertheless, it is very difficult to pull something out of them. I, from my experience of communicating on HighLoad, I can say that there they manage to receive information about them very portionwise. Therefore, many thanks to you for the articles on Habré, they are very interesting and, in principle, from there I borrowed some thoughts and solutions for my project.
Anton Sergeev:Well, that’s all we wanted to tell you about today, dear listeners. Thank you for sending questions. Of course, so far only some users send them to us, but I think that over time, others will also formulate their questions and ask them to our guests. We will try to give more time between the announcement of the release and before its actual recording, in order to be able to think about something else to ask. I remind you that today you were hosted by Anton Kopylov and me, Anton Sergeyev.
Anton Kopylov: And our guests from Badoo are the release engineers Oleg Oyamäe and Vladislav Chernov. Thanks a lot guys.
Anton Sergeev: Yes, thanks for the interesting story, it was very cool and interesting.
Oleg Oyamäe: Thank you for the interview.
Anton Sergeev:Listen to our releases on the site “www.itkompot.ru”, also on the podfm.ru podcast terminal, subscribe to iTunes. Good luck with your development, and let’s hear, bye.
Listen to the podcast fully.
Download the podcast issue.