Docker workflow

The transfer of the hexlet.io infrastructure to Docker required some effort from us. We abandoned many old approaches and tools, rethought the meaning of many familiar things. What we got in the end, we like. Most importantly, this transition has greatly simplified, unified, and made much more supported. In this article, we will talk about the scheme for deploying the infrastructure and deployment that we eventually came to, as well as describe the pros and cons of this approach.

Background

Initially, we needed Docker to run untrusted code in an isolated environment. The task is somewhat similar to what the hosters do. We collect images directly in production, which are then used to start the practice. This, by the way, is that rare case when you can’t do it on the basis of “one container - one service”. We need all the services and all the code for a specific task to be in the same environment. Minimally, in each such container, supervisord rises and our browser-based idea. Then everything depends on the task itself: the author can add and deploy at least radish, at least hadup there.

And it turned out that the docker made it possible to create a simple way to assemble practical tasks. Firstly, because if the practice gathered and started working on the author’s local machine, then it is guaranteed (almost) to start in production as well. For isolation. And secondly, despite the fact that many people consider the docker file to be an “ordinary tower” with all the consequences, this is not so. Docker is a prime example of using a functional paradigm in the right places. It provides idempotency, but not like configuration management systems, due to internal verification mechanisms, but due to immutability. Therefore, the dockerfile is an ordinary bash, but it rolls as if it always happens on a fresh base image, and you do not need to consider the previous state when changing the image. And caching removes (almost) the problem of waiting for rebuilding.

At the moment, this subsystem is essentially a continuous delivery for practical tasks. Perhaps we will make a separate article on this subject if the audience has interest.

Docker in infrastructure

After that, we thought about moving to Docker and the rest of our system. There were several reasons. It is clear that in this way we would achieve a greater unification of our system, because the docker has already occupied a serious (and not very trivial) part of the infrastructure.

In fact, there is another interesting case. Many years ago I used chef, after that ansible, which is much simpler. At the same time, I always came across such a story: if you do not have your own admins, and you do not deal with infrastructure and playbooks / cookbooks regularly, then often unpleasant situations arise in cases like:

The configuration management system has been updated (especially with the boss), and you spend two days trying to bring everything under this matter.
You forgot that some software was on the server, and with a new roll-up, conflicts begin, or everything crashes. Need transition states. Well, or how do those who filled the cones: "every time to a new server."
Redistributing services across servers is a pain, everyone affects each other.
There are another thousand smaller reasons, mainly due to lack of isolation.

In this regard, we looked at Docker as a miracle that would save us from these problems. So it happened, in general. At the same time, servers still have to be periodically repackaged from scratch, but much less often and, most importantly, we have reached a new level of abstraction. Operating at the level of a configuration management system, we think and manage services, not the parts of which they consist. That is, the control unit is a service, not a package.

Also, the key story of a painless deployment is a quick, and, importantly, simple rollback. In the case of Docker, it is almost always fixing the previous version and restarting the services.

And last but not least. Assembling a hexlet has become a little more complicated than just compiling assets (we are on rails, yes). We have a massive js infrastructure that is built using webpack. Naturally, all this economy must be collected on one server and then just scatter. Capistrano does not allow this.

Infrastructure deployment

Almost all we need from configuration management systems is creating users, delivering keys, configs, and images. After switching to docker, playbooks became monotonous and simple: created users, added configs, sometimes a little crown.

Another very important point is the way containers are launched. Despite the fact that Docker comes with its own supervisor out of the box, and Ansible comes with a module for launching Docker containers, we still decided not to use these approaches (although we tried). The Docker module in Ansible has many problems , some of which are not at all clear how to solve. This is largely due to the separation of the concepts of creating and starting a container, and the configuration is spread between these stages.

We eventually settled on upstart. It is clear that soon you will still have to leave for systemd anyway, but it so happened that we use ubuntu of the version where upstart is by default by default. At the same time, we solved the issue of universal logging. Well, upstart also allows you to flexibly configure how to start a service restart, unlike docker's restart_always: true.

upstart.unicorn.conf.j2

description "Unicorn"
start on filesystem or runlevel [2345]
stop on runlevel [!2345]
env HOME=/home/{{ run_user }}
# change to match your deployment user
setuid {{ run_user }}
setgid team
respawn
respawn limit 3 30
pre-start script
    . /etc/environment
    export HEXLET_VERSION
    /usr/bin/docker pull hexlet/hexlet-{{ rails_env }}:$HEXLET_VERSION
    /usr/bin/docker rm -f unicorn || true
end script
pre-stop script
    /usr/bin/docker rm -f unicorn || true
end script
script
  . /etc/environment
  export HEXLET_VERSION
  RUN_ARGS='--name unicorn' ~/apprunner.sh bundle exec unicorn_rails -p {{ unicorn_port }}
end script

The most interesting thing here is the service launch line:

RUN_ARGS='--name unicorn' ~/apprunner.sh bundle exec unicorn_rails -p {{ unicorn_port }}

This is done in order to be able to start the container from the server, without having to manually write all the parameters. For example, this way we can enter the rail console:

RUN_ARGS=’-it’ ~./apprunner.sh bundle exec rails c

apprunner.sh.j2

#!/usr/bin/env bash
. /etc/environment
export HEXLET_VERSION
${RUN_ARGS:=''}
COMMAND="/usr/bin/docker run --read-only --rm \    $RUN_ARGS \    -v /tmp:/tmp \    -v /var/tmp:/var/tmp \    -p {{ unicorn_port }}:{{ unicorn_port }} \    -e AWS_REGION={{ aws_region }} \    -e SECRET_KEY_BASE={{ secret_key_base }} \    -e DATABASE_URL={{ database_url }} \    -e RAILS_ENV={{ rails_env }} \    -e SMTP_USER_NAME={{ smtp_user_name }} \    -e SMTP_PASSWORD={{ smtp_password }} \    -e SMTP_ADDRESS={{ smtp_address }} \    -e SMTP_PORT={{ smtp_port }} \    -e SMTP_AUTHENTICATION={{ smtp_authentication }} \    -e DOCKER_IP={{ docker_ip }} \    -e STATSD_PORT={{ statsd_port }} \    -e DOCKER_HUB_USERNAME={{ docker_hub_username }} \    -e DOCKER_HUB_PASSWORD={{ docker_hub_password }} \    -e DOCKER_HUB_EMAIL={{ docker_hub_email }} \    -e DOCKER_EXERCISE_PREFIX={{ docker_exercise_prefix }} \    -e FACEBOOK_CLIENT_ID={{ facebook_client_id }} \    -e FACEBOOK_CLIENT_SECRET={{ facebook_client_secret }} \    -e HEXLET_IDE_VERSION={{ hexlet_ide_image_tag }} \    -e CDN_HOST={{ cdn_host }} \    -e REFILE_CACHE_DIR={{ refile_cache_dir }} \    -e CONTAINER_SERVER={{ container_server }} \    -e CONTAINER_PORT={{ container_port }} \    -e DOCKER_API_VERSION={{ docker_api_version }} \    hexlet/hexlet-{{ rails_env }}:$HEXLET_VERSION $@"
eval $COMMAND

There is one subtle point here. Unfortunately, the history of the teams is lost. To restore performance, you need to flip the corresponding files, but, frankly, we did not do this.

By the way, here you can see another advantage of the docker: all external dependencies are indicated explicitly and in one place. If you are not familiar with this approach to configuration, then I recommend referring to this document from heroku.

Dockerization

Dockerfile

Dockerfile

FROM ruby:2.2.1
RUN mkdir -p /usr/src/app
WORKDIR /usr/src/app
ENV RAILS_ENV production
ENV REFILE_CACHE_DIR /var/tmp/uploads
RUN curl -sL https://deb.nodesource.com/setup | bash -
RUN apt-get update -qq \
  && apt-get install -yqq apt-transport-https libxslt-dev libxml2-dev nodejs imagemagick
RUN echo deb https://get.docker.com/ubuntu docker main > /etc/apt/sources.list.d/docker.list \
  && apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 36A1D7869245C8950F966E92D8576A8BA88D21E9 \
  && apt-get update -qq \
  && apt-get install -qqy lxc-docker-1.5.0
# bundle config build.rugged --use-system-libraries
# bundle config build.nokogiri --use-system-libraries
COPY Gemfile /usr/src/app/
COPY Gemfile.lock /usr/src/app/
COPY package.json /usr/src/app/
# without development test
RUN npm install
RUN bundle install --without development test
COPY . /usr/src/app
RUN ./node_modules/gulp/bin/gulp.js webpack_production
RUN bin/rake assets:precompile
VOLUME /usr/src/app/tmp
VOLUME /var/folders

The first line shows that we do not need to worry about installing ruby, we just indicate the version that we want to use (and for which there is an image, of course).

Containers are launched with the --read-only flag, which allows you to control writing to disk. Practice shows that they try to write everything in a row, in completely unexpected places. Below you can see that we created volume / var / folders, it writes rub when creating a temporary directory. But we throw some sections outside, for example / var / tmp, to fumble data between different versions. This is optional, but just saves us resources.

Also inside we put the docker in order to control the docker from the docker. This is just for managing images with practice.

Further, in just four lines, we describe everything that makes capistrano as a means of building the application.

Image hosting

You can raise your own docker distribution (former registry), but we are quite happy with the docker hub, for which we pay $ 7 per month and get 5 private repositories. He, of course, is far from perfect, both in terms of usability and capabilities. And sometimes the assembly of images instead of 20 minutes is delayed by an hour. In general, you can live, although there are alternative cloud solutions.

Assembly and Deploy

The way you build the application differs depending on your deployment environment.

At staging, we use automated build, which is built as soon as it sees changes in the staging branch.

As soon as the image is assembled, docker hub via webhook notifies zapier , which, in turn, sends information to Slack. Unfortunately, the docker hub does not know how to work directly with Slack (and the developers do not plan to support it).

Deployment of staging is performed by the team:

ansible-playbookdeploy.yml-istaging.ini

Here is how we see it in slack:

Unlike staging, a production image is not automatically collected. At the time of readiness, it builds up by manual launch on a special build server. With us, this server simultaneously serves as a bastion .

Another difference is the active use of tags. If we always have the latest in staging, then here during the assembly we explicitly specify the tag (it is the version).

The build starts like this:

ansible-playbook build.yml -i production.ini -e ‘hexlet_image_tag=v100’

build.yml

- hosts: bastions
  gather_facts: no
  vars:
    clone_dir: /var/tmp/hexlet
  tasks:
    - git:
        repo: git@github.com:Hexlet/hexlet.git
        dest: '{{ clone_dir }}'
        accept_hostkey: yes
        key_file: /home/{{ run_user }}/.ssh/deploy_rsa
      become: yes
      become_user: '{{ run_user }}'
    - shell: 'cd {{ clone_dir }} && docker build -t hexlet/hexlet-production:{{ hexlet_image_tag }} .'
      become: yes
      become_user: '{{ run_user }}'
    - shell: 'docker push hexlet/hexlet-production:{{ hexlet_image_tag }}'
      become: yes
      become_user: '{{ run_user }}'

The production deployment is performed by the command:

ansible-playbook deploy.yml -i production.ini -e ‘hexlet_image_tag=v100’

deploy.yml

- hosts: localhost
  gather_facts: no
  tasks:
  - local_action:
      module: slack
      domain: hexlet.slack.com
      token: {{ slack_token }}
      msg: "deploy started: {{ rails_env }}:{{ hexlet_image_tag }}"
      channel: "#operation"
      username: "{{ ansible_ssh_user }}"
- hosts: appservers
  gather_facts: no
  tasks:
    - shell: docker pull hexlet/hexlet-{{ rails_env }}:{{ hexlet_image_tag }}
      become: yes
      become_user: '{{ run_user }}'
    - name: update hexlet version
      become: yes
      lineinfile:
        regexp: "HEXLET_VERSION"
        line: "HEXLET_VERSION={{ hexlet_image_tag }}"
        dest: /etc/environment
        backup: yes
        state: present
- hosts: jobservers
  gather_facts: no
  tasks:
    - become: yes
      become_user: '{{ run_user }}'
      run_once: yes
      delegate_to: '{{ migration_server }}'
      shell: >
        docker run --rm
        -e 'SECRET_KEY_BASE={{ secret_key_base }}'
        -e 'DATABASE_URL={{ database_url }}'
        -e 'RAILS_ENV={{ rails_env }}'
        hexlet/hexlet-{{ rails_env }}:{{ hexlet_image_tag }}
        rake db:migrate
- hosts: webservers
  gather_facts: no
  tasks:
    - service: name=nginx state=running
      become: yes
      tags: nginx
    - service: name=unicorn state=restarted
      become: yes
      tags: [unicorn, app]
- hosts: jobservers
  gather_facts: no
  tasks:
    - service: name=activejob state=restarted
      become: yes
      tags: [activejob, app]
- hosts: localhost
  gather_facts: no
  tasks:
  - name: "Send deploy hook to honeybadger"
    local_action: shell cd .. && bundle exec honeybadger deploy --environment={{ rails_env }}
  - local_action:
      module: slack
      domain: hexlet.slack.com
      token: {{ slack_token }}
      msg: "deploy completed ({{ rails_env }})"
      channel: "#operation"
      username: "{{ ansible_ssh_user }}"
      # link_names: 0
      # parse: 'none'

In general, the deployment itself is downloading the necessary images to the server, performing migrations, and restarting the services. Suddenly it turned out that the whole kapistran was replaced by a dozen lines of straightforward code. And at the same time, a dozen gems of integration with kapistran, suddenly, were simply not needed. The tasks that they performed most often turn into one task for ansible.

Development

The first thing you have to give up while working with the docker is from developing in Mac OS. For normal operation, you need Vagrant. To configure the environment, we have written a special playbook vagrant.yml. For example, in it we install and configure the database, although in production we use RDS.

Unfortunately (and maybe fortunately), we still could not configure the normal workflow of development through the docker. Too many compromises and difficulties. At the same time, services like postgresql, redis and the like, we still run through it, even during development. And all this stuff continues to be managed through upstart.

Monitoring

Of the interesting things, we set up a Google cadvisor, which, in turn, sent the collected data to influxdb. Periodically, the cadvisor began to eat some wild amount of memory and had to restart it with his hands. And then it turned out that influxdb is good, but the alert on top of it simply does not exist. All this led to the fact that we abandoned any homemade. Now we have a datadog spinning with the corresponding plugins connected, and we are very satisfied.

Problems

After switching to docker, I immediately had to abandon quick fixes. Assembling an image can take up to 1 hour. And this pushes you to a more correct flow, to the ability to quickly and painlessly roll back to the previous version.

Sometimes we come across bugs in the docker itself (more often than we would like), for example, right now we can’t switch to 1.6.2 from 1.5 because they still have a few not closed tickets with problems that many people come across.

Total

The changeable state of the server when deploying software is a pain point in any configuration system. Docker takes over most of this work, which allows the servers to be in a very clean state for a long time, and we don’t have to worry about transition periods. Changing the version of the same ruby has become not only a simple task, but also completely independent of the administrator. And the unification of launch, deployment, deployment, assembly and operation allows us to spend much less time on system maintenance. Yes, aws certainly helps us a lot, but that does not negate the ease of use of docker / ansible.

Plans

The next step we want to implement continuous delivery and completely abandon staging. The idea is that the rollout will first be conducted on production servers accessible only from within the company.

PS
Well, for those who are not familiar with ansible, yesterday we released a basic course .

Tags: