Retrospective of automation and changes in the development processes of Timeweb

On November 1, 2017, I became the head of the development team in the Timeweb software development department. And on November 12, 2018, the head of the department asked when the article for Habrahabr will be ready, because the marketing department asks, the volunteers are over, and the content plan requires something else)

Therefore, I want to give a retrospective of how the processes of development, testing and delivery of our products changed over the last year. About inherited processes and tools, docker, gitlab and how we are developing.

The hoster Timeweb exists since 2006. All this time, the company has invested a lot of effort to provide customers with a unique and convenient service that would distinguish it from competitors. Timeweb has its own mobile applications, a web-based email interface, virtual hosting control panels, VDS, an affiliate program, its own support tools and much more.

There are about 250 projects in our gitlab: these are client applications, internal tools, libraries, configuration storage. Dozens of them are actively developed and supported: they commit within the work week, test, collect, release.

In addition to the large amount of inherited code, all this pulls the corresponding number of inherited processes and related tools. Like any legacy, they also need to be maintained, optimized, refactored, and sometimes replaced.

From all this abundance of projects are closest to the clients hosting the control panel. And it is in the “Control Panel” project that we often run in various infrastructure improvements and make a lot of effort to keep the connected infrastructure in shape. Spreading the experience and favorite practices to other products and their teams.

I will tell you about different changes in tools and processes over the past year.

Vagrant → docker-compose

Problem

On the first working day, I tried to raise the control panel locally. At that time, there were five web applications in one repository:

- Virtual hosting 3.0
PU , - VDS 2.0
PU,
- Webmasters PU, - STAFF (support tool),
- Guidelines (standard demo front-end components).

Vagrant was used locally to run. In Vagrant, it was run ansible. To start and configure it took the help of colleagues and about a day clean time. I had to install a special version of the Virtual Box (there were problems on the current stable one), the work from the console inside the virtual machine was unnerving: trivial commands like npm / composer install significantly slowed down.

The performance of the applications themselves in the virtual machine was far from possible, given the technology stack used and the power of the machine. Not to mention that a virtual machine is a virtual machine, and by definition it occupies a significant part of the resources of your PC.

Decision

The local development environment was rewritten to run in docker containers. Docker-based containerization is the most common solution for isolating the environment of applications at all stages of its life cycle. Therefore, there are no special alternatives.

findings

Of the benefits:

- locally the application has become more responsive, containers require less than the VM,
- launching a new instance, as practice has shown, takes a few minutes and only docker (-compose) requires no less than certain versions. After cloning, just run:

make install-dev
make run-dev

Not without compromise:

- I had to write shell-binding for dokurizirovannyh teams (composer, npm, etc.). They, like docker-compose.yml, are not completely cross-platform, in comparison with Vagrant. For example, launching on a Mac requires additional efforts, and on Windows, it will probably be easier to run the linux distribution with the docker in the virtual Linux. But this is an acceptable compromise, because the command uses only debian-based distributions, this is an acceptable restriction for commercial development,
- a container based on github.com/jwilder/nginx-proxy is run locally to support virtual hosts . Not that a crutch, but additional software, which sometimes must be remembered, although it does not cause problems.

Yes, everyone in the team had to realize a little bit what a docker is. Although thanks to the mentioned shell scripts and Makefile, developers perform 95% of their tasks without thinking about containers, but in guaranteed identical environments.

newcp-dev → cp-stands

These strange phrases are the names of the machines with test panels of the control panels, new and old, respectively.

Problem

The ansible recipes were used exclusively inside Vagrant, so the main advantage was not achieved: the versions of the packages in the sale and on the stands differed from what the developers worked.

The inconsistency of the versions of the server software packages on the old stands with what the developers had led to problems. Synchronization was complicated by the fact that system administrators use a different configuration management system, and it is not possible to integrate it with the developers repository.

Decision

After containerization, it was not too difficult to expand the docker-compose configuration for use on test benches. A new machine was created, for the deployment of stands on DOCKER_HOST.

findings

Developers are now confident in the relevance of local and test environments.

TeamCity → gitlab-ci

Problems

Configuring projects in TeamCity is a painstaking and ungrateful process. The CI configuration was stored separately from the code, in xml, to which normal versioning is not applicable, and an overview of the changes. We also experienced problems with the stability of the build process on TeamCity agents.

Decision

Since gitlab was already used as a repository repository, starting to use its CI was not only logical, but also easy and pleasant. Now the entire CI / CD configuration is right in the repository.

Result

For the year, almost all the projects gathered by TeamCity safely moved to gitlab-ci. We were able to quickly implement a variety of features for automating CI / CD processes.
The clearest screenshots will be pipelines:

Fig. 1. feature-branch: includes all available automatic checks and tests. Upon completion, sends a comment with reference to the pipeline to the redmine task. Manual tasks for the assembly and launch of the stand with this branch.

Fig. 1. feature-branch: includes all available automatic checks and tests. Upon completion, sends a comment with reference to the pipeline to the redmine task. Manual tasks for the assembly and launch of the stand with this branch.

Ill. 1. feature-branch: includes all available automatic checks and tests. Upon completion, sends a comment with reference to the pipeline to the redmine task. Manual tasks for the assembly and launch of the stand with this branch.

Ill. 2. develop a scheduled build with code freeze (checkout: rc): build develop on a schedule with code freeze. The assembly of the images for the stands of individual control panels takes place in parallel.

Ill. 2. develop a scheduled build with code freeze (checkout: rc): build develop on a schedule with code freeze. The assembly of the images for the stands of individual control panels takes place in parallel.

Fig. 2. develop a scheduled build with code freeze (checkout: rc): build develop on a schedule with code freeze. The assembly of the images for the stands of individual control panels takes place in parallel.

Ill. 3. tag pipeline: release one of the control panels. Manual task for rollback release.

In addition, from gitlab-ci there is a change of statuses and the assignment of a person responsible in redmine at the stages In Progress → Review → QA, notification in Slack about releases and updates of staging and rollbacks.

This is convenient, but we did not take into account one methodological point. Having implemented similar automation in one project, people quickly get used to it. And in the case of switching to another project where there is no such thing yet, or the process is different, you can forget to move and reassign the puzzle to redmine or leave a comment with reference to the Merge Request (which is also done by gitlab-ci), thus forcing the reviewer to look for the desired MR yourself. At the same time, you simply don’t want to copy pieces of .gitlab-ci.yml and the accompanying shell code between projects, since you have to support copy-paste.

Conclusion:Automation is good, but when it is the same at the level of all teams and projects, it is even better. I would be grateful to the respected public for ideas on how to organize a reuse of such a configuration beautifully.

Pipeline duration: 80 min → 8 min

Gradually, our CI began to occupy indecently much time. Testers suffered greatly from this: every fix in the master for release had to wait an hour. It looked like this:

Ill. 4. pipeline 80 lvl min duration.

Ill. 4. pipeline 80 ~~lvl~~ min duration.

It took a few days to dive into the analysis of slow places and search for ways to accelerate while maintaining functionality.

The most lengthy places in the process were installing npm-packages. Without any problems, they replaced it with yarn and saved up to 7 minutes in several places.

Abandoned automatic staging updates, preferred manual control of the state of this stand.

We also added several runners and divided into parallel tasks the assembly of application images and all the checks. After these optimizations, the pipeline of the main branch with the update of all the stands began to occupy in most cases 7-8 minutes.

Capistrano → deployer

For deployment in production and on the qa-booth, Capistrano was used (and continues to be used at the time of this writing). The main scenario for this tool: cloning the repository to the target server and performing all the tasks there.

Previously, the deployment was launched by the hands of a QA-engineer who has the necessary ssh-keys from Vagrant. Then, as far as giving up Vagrant, Capistrano moved to a separate container. Now the deployment is made from a container with Capistrano with gitlab-runners, marked with special tags and having the necessary keys, automatically when the necessary tags appear.

The problem here is that the whole build process:

a) significantly consumes combat server resources (especially node / gulp),
b) there is no way to keep the composer version, npm, up to date. node, etc.

It is more logical to build on a build server (in our case, this is gitlab-runner), and upload ready-made artifacts to the target server. This will relieve the combat server from the assembly utilities and alien responsibility.

Now we are considering deployer as a replacement for capistrano (because we don’t have any rubists, and neither have the desire to work with its DSL) and plan to transfer the assembly to the gitlab side. In some non-critical projects we have already managed to try it and are satisfied so far: it looks simpler, we have not encountered limitations.

Gitflow: rc-branches → tags

Development is carried out in weekly cycles. Over the course of five days, a new version is being developed: improvements and corrections are scheduled for release, scheduled for release next week. On Friday evening, code freeze automatically occurs. On Monday, testing of the new version begins, improvements are made, and by the mid-end of the working week, a release takes place.

Previously, we used branches with names like rc18-47, which means the release of the candidate of the 47th week of 2018. Code freeze consisted in checkout rc-branches from develop. But in October of this year they switched to tags. Tags have been set before, but after the fact, after the release and merger of rc with master. Now the appearance of the tag leads to automatic deployment, and freezing this merge develop into master.

So we got rid of unnecessary entities in git and variables in the process.
Now we are “pulling” projects that are lagging behind in the process to a similar workflow.

Conclusion

Automation of processes, their optimization, as well as development, is a permanent matter: while the product is actively developing and the team is working, there will be corresponding tasks. There are new ideas how to get rid of routine actions: features are implemented in gitlab-ci.

With the growth of applications, CI processes begin to take a prohibitively long time - it's time to work on their performance. As approaches and tools become obsolete, it is necessary to devote time to refactoring, reviewing them, and updating them.

Tags:

Retrospective of automation and changes in the development processes of Timeweb

Vagrant → docker-compose

Problem

Decision

findings

newcp-dev → cp-stands

Problem

Decision

findings

TeamCity → gitlab-ci

Problems

Decision

Result

Pipeline duration: 80 min → 8 min

Capistrano → deployer

Gitflow: rc-branches → tags

Conclusion

Also popular now: