nem April 10, 2018 at 14:27

Why the programmer Continuous Integration and where to start

Imagine that in Roscosmos they decided to assemble a new rocket without having blueprints and a clear understanding of how the rocket should be arranged. A separate plant deals with the rocket hull, a separate one produces engines, another nozzle. The chief manager of Roscosmos said that he trusted professionals, and expertly delegated all the work to the plants.

After a year, all the components are delivered to the main assembly shop, and it turns out that the engine does not enter the body, and the nozzles begin to melt even during test engine starts.

To prevent such garbage from happening, in real projects there is always a stage of planning and design, on which the specifications of how the parts will interact with each other and what characteristics they should have are fixed.

When developing software, we cannot afford a long design phase, because during this time, the business value of what we are trying to develop will be lost - competitors will stupidly pass us.

Therefore, teams developing program components (modules) are often forced to work without fully understanding how their module will interact with other parts.

As in the case of a rocket, when you try to release a new release of an application developed in parts by several teams, it may turn out that some of the modules are not compatible.

In 1991, Grady Butch was apparently tired of such an outrage, and suggested assembling the entire project every day to find out incompatibilities not on the release day, but earlier - and called this approach Continuous Integration.

Indeed, compiling a program is easier than assembling a rocket (especially from unfinished components), so why not start doing it once a day? In Extreme Programming, they decided to aggravate this topic, and arrange assembly several times a day.

What is an assembly?

Well, for example, you need to copy the modules in one place and start compiling the program.
If everything worked out, then the assembly can be considered successful, if not, then the team has an occasion to sort out the details, and solve the problem until everything has gone too far.

There is nothing to compile in interpreted languages like PHP, Python, and Ruby. The assembly in them can be the launch of unit tests, the deployment of a web application to a test server, and the run of acceptance tests on this test server.

In total, Continuous Integration is a practice. The purpose of the practice is to reduce the number of integration fakaps, to improve the quality of the software produced. The method is to launch the project assembly several times a day.

I admit that in ancient times Grad Butch began to practice CI generally by hand, running around the departments of his company, forcing everyone to give him floppy disks with the latest versions of the modules, and then with the tongue at the mercy it all compiled manually :) For

us in our 2018 it’s a little easier - the process CI is automated in a heap of systems - choose any one to your taste:

About CI systems

There are a lot of them . They are different. There are specialized for a specific programming language. There are mobile applications developed for development, there are built-in IDEs, there are settings via the GUI, and there are settings via the configuration file in the repository. Some work only in the cloud, some can be installed on your servers.

We will not try to cover them all now. Let us take only modern general-purpose systems, which are web applications.

Modern development implies that you are using a Version Control system.
Version Control is almost always git.
Git is almost always GitHub or a similar service.
Best practice of modern design - featurebranches and pullrequests.

CI just fit perfectly into the story of pullrequests.

You not only see all the changes, discuss the development of a specific feature in one place, you immediately show the status of the corresponding CI assembly:

(here the assembly failed - you need to go to Details, find out where it broke and repair)

Repaired, committed, started, and the process starts a new one - the CI system detects new changes on the branch and starts a new assembly:

The assembly can consist of several stages. The most typical are tests, compilation, deployment.

Naturally, the CI system needs to explain what these steps are.

Configuring CI Systems

All correct CI systems adhere to the principle of configuration as code, when the instructions for the CI system are just another file in the repository of your project. Somehow it happened that most systems use the YAML format for instructions.

The simplest instructions file for GitLab CI might look like this:

test_job:
  script:
    - ls -l

Slightly more complex instructions (Circle CI):

version: 2
jobs:
  test:
    docker:
      - image: circleci/ruby:2.4-node
    steps:
      - checkout
      - run: rspec
  deploy:
    docker:
      - image: circleci/ruby:2.4-node
    steps:
      - checkout
      - run: cap deploy

And here we recall that all these instructions should be launched somewhere.

What's on CI systems under the hood

For each push to the repository, the CI system creates a virtual machine inside which launches your instructions, takes the results of work, and extinguishes the machine:

A clean system is needed every time in order to limit the influence of external factors on the assembly.

Some CI systems use docker for this, some get out without it. Hint: take the one with the docker;)

(in reality, everything is somewhat more complicated, and most CI systems can work in a bunch of different modes, but it's better to start with the default mode - just launch tasks inside the docker container)

Why is this all for an ordinary developer?

The more aspects of development you are able to close, the more your value. If you just wrote the code, started it, and then even though the grass does not grow, this is one thing. If at the same time you take care of the tests and the deployment of the application, it’s completely different. If you automated all this in CI - the third one :)
Just understand that over time, CI will be the same default thing as Version Control.

I will give you a couple of situations from practice. It is possible that you have visited some of the following:

Situation number 1

Two developers develop on different branches. Each of them tests pass successfully, including after the merge with the master.

But as soon as both branches are merged into the master, the tests begin to fall. The problem is that without CI, such a code can easily go into production, and the presence of an error will be found out only there.

Situation number 2

A team of fifteen people is developing a web application. Vasily has a family holiday on his nose. He asked the authorities to leave early, having not checked the code very carefully, launched a deploy on the command line, and left work with his computer shut down and disconnecting the phone at the same time - the family is more important than work!

As a result, the production was broken, the deployment logs remained on the locked computer, the remaining developers tear their hair out in an attempt to understand what was darned, and what went wrong in the deployment process, and who is to blame.

If the team uses the CI system, then all deployment logs (and any other tasks) are stored in it, so you can always see what went wrong.

What CI system to start with?

Github gives us a good hint : an

open source project on GitHub most likely runs tests in Travis CI. You can’t face it. Ok, take a note of Travis CI, but google and read manuals until we run.

Let's take a look at the findings of a recent study by The Forrester Wave about Continuous Integration Tools :

In a nutshell, we examined the current capabilities of the CI system, the potential for development, and the current market size.
Who is interested in the testing methodology, read the study itself. We are interested in the conclusions.

The leaders were:

4) CloudBees is the company behind Jenkins. This is one of the oldest open source CI systems, and it would be strange not to see them in the lead.

3) Circle CI already in the second source was in the lead. So, too, deserves attention.

2) Microsoft - no comments. A huge company with great influence. If you are developing software for Windows, then most likely you will not even have a choice. If the company has purchased licenses for the Microsoft stack of development tools with TFS and their CI-system, you will just have what they give, without really looking around

1) GitLab comes first. For those who have not been following the company for the past few years, this may come as a complete surprise. Let's talk about it in more detail.

Gitlab

^{The author worked for a year at Gitlab as a Developer Advocate, so you have every right to consider me biased and unscrew the skepticism knob by a couple of tens of percent when reading the following paragraphs. Nevertheless, I think that many things about Gitlab should be known, and even better - start using it.}

It is important that Gitlab refused to consider itself simply a clone of Github, and the company decided to develop Gitlab in the direction of a “single window system” for the developer. Gitlab is already able to replace you with Github, Trello, CI-system, and so on on the list. All tools are already configured and integrated with each other. The result for the developer is a minimum of fuss with the settings.

Gitlab's source code is open, and you can use it for free on your own server if you wish. But we in this story are now most interested in the cloud version - GitLab.com and the possibilities that it gives:

Hosting an unlimited number of private projects
2000 build minutes per month for working with the built-in CI for free

Freebie, yeah. Well, the opportunity to learn, using for personal projects, as a result.

In short, you already guessed that my personal recommendation was GitLab CI . Gitlab has every chance of becoming the next big thing in the future. And to be an experienced specialist in a suddenly become fashionable thing is to be a sought-after specialist;)

If you are not ready to leave the Github, try Circle CI or Travis CI .

In the early stages of getting to know CI, avoid Jenkins and TeamCity . There are too many overheads in these old-school systems, so you will have to deal more with the features of the systems than to figure out the CI itself.

You start to think that Jenkins is CI, and this is stupid. A possible exception is you are a javist, a rockman or a Kotlinist, and for you everything works right out of the box.

If it’s written to your kind to program under Windows, then you definitely won’t pass by the Microsoft stack, but I wouldn’t use it of my own free will.

Ok, let's move on ...

What tasks to start with

The good news is that you don’t have to immediately become a CI expert. For starters, a simple “be in the subject” is enough. So start with something light:

For example, learn to deploy a personal site when pushing to the master.
Embed spellcheck with yaspeller for texts on your personal website
Launch a couple of linters in the CI of a minor working draft .

Launching your first build inside CI is a two-minute task. To start something useful, you will have to spend a little more time, because You need to set up a working environment for your scripts.

Here we will not delve into this. For those interested - links at the end of the article.

In the meantime, let's see how CI is used in large projects, and what power it can give you.

What gives CI in large projects

In large projects - serious requirements. Here you can’t get by with the launch of one or two teams. It is easy to imagine logic like this: 1) complete the task, only when committing to certain branches, 2) successfully complete the following tasks, 3) ignore errors when completing certain tasks, and 4) wait for the hand-held signal to complete the whole thing.

My favorite example is GitLab itself:

work is always in full swing there, and you can see how CI works right in real time: https://gitlab.com/gitlab-org/gitlab-ce/pipelines

This is a big project, and its CI-config I would not recommend watching without prior preparation.

Instead, better look at the visualization of the CI task chain:

https://gitlab.com/gitlab-org/gitlab-ce/pipelines/20201164

1) This is not in all CI-systems. Where it is, it gives the team a visual representation of what is happening in CI , and I find it worth a lot.

2) In addition, the CI-system settings file becomes a working documentation of the Continuous Integration process in the command. Any developer can add or change something there by creating a pull request with the desired changes.

3) Finally, the CI system becomes the central place where you can see where and when what went wrong during the deployment or test run.

Well, add to your taste all these traditional blah blah blah about speeding up the development process, increasing reliability, and reducing risks.

Where to dig further

(if you wanted to learn a little deeper)

1) Look at the screencast with examples of how to run deploy on S3 and Heroku using GitLab CI.

2) Read the introduction to GitLab CI and the deployment guide .

3) Smoke the documentation .

4) After you get comfortable with one CI-system, select and play with any other from the list of promising ones to make sure that the principle is common everywhere, and get even more confidence in your abilities.

Tags: