Docker workflow

    The transfer of the hexlet.io infrastructure to Docker required some effort from us. We abandoned many old approaches and tools, rethought the meaning of many familiar things. What we got in the end, we like. Most importantly, this transition has greatly simplified, unified, and made much more supported. In this article, we will talk about the scheme for deploying the infrastructure and deployment that we eventually came to, as well as describe the pros and cons of this approach.

    Background


    Initially, we needed Docker to run untrusted code in an isolated environment. The task is somewhat similar to what the hosters do. We collect images directly in production, which are then used to start the practice. This, by the way, is that rare case when you can’t do it on the basis of “one container - one service”. We need all the services and all the code for a specific task to be in the same environment. Minimally, in each such container, supervisord rises and our browser-based idea. Then everything depends on the task itself: the author can add and deploy at least radish, at least hadup there.

    And it turned out that the docker made it possible to create a simple way to assemble practical tasks. Firstly, because if the practice gathered and started working on the author’s local machine, then it is guaranteed (almost) to start in production as well. For isolation. And secondly, despite the fact that many people consider the docker file to be an “ordinary tower” with all the consequences, this is not so. Docker is a prime example of using a functional paradigm in the right places. It provides idempotency, but not like configuration management systems, due to internal verification mechanisms, but due to immutability. Therefore, the dockerfile is an ordinary bash, but it rolls as if it always happens on a fresh base image, and you do not need to consider the previous state when changing the image. And caching removes (almost) the problem of waiting for rebuilding.

    At the moment, this subsystem is essentially a continuous delivery for practical tasks. Perhaps we will make a separate article on this subject if the audience has interest.

    Docker in infrastructure


    After that, we thought about moving to Docker and the rest of our system. There were several reasons. It is clear that in this way we would achieve a greater unification of our system, because the docker has already occupied a serious (and not very trivial) part of the infrastructure.

    In fact, there is another interesting case. Many years ago I used chef, after that ansible, which is much simpler. At the same time, I always came across such a story: if you do not have your own admins, and you do not deal with infrastructure and playbooks / cookbooks regularly, then often unpleasant situations arise in cases like:
    • The configuration management system has been updated (especially with the boss), and you spend two days trying to bring everything under this matter.
    • You forgot that some software was on the server, and with a new roll-up, conflicts begin, or everything crashes. Need transition states. Well, or how do those who filled the cones: "every time to a new server."
    • Redistributing services across servers is a pain, everyone affects each other.
    • There are another thousand smaller reasons, mainly due to lack of isolation.


    In this regard, we looked at Docker as a miracle that would save us from these problems. So it happened, in general. At the same time, servers still have to be periodically repackaged from scratch, but much less often and, most importantly, we have reached a new level of abstraction. Operating at the level of a configuration management system, we think and manage services, not the parts of which they consist. That is, the control unit is a service, not a package.

    Also, the key story of a painless deployment is a quick, and, importantly, simple rollback. In the case of Docker, it is almost always fixing the previous version and restarting the services.

    And last but not least. Assembling a hexlet has become a little more complicated than just compiling assets (we are on rails, yes). We have a massive js infrastructure that is built using webpack. Naturally, all this economy must be collected on one server and then just scatter. Capistrano does not allow this.

    Infrastructure deployment


    Almost all we need from configuration management systems is creating users, delivering keys, configs, and images. After switching to docker, playbooks became monotonous and simple: created users, added configs, sometimes a little crown.

    Another very important point is the way containers are launched. Despite the fact that Docker comes with its own supervisor out of the box, and Ansible comes with a module for launching Docker containers, we still decided not to use these approaches (although we tried). The Docker module in Ansible has many problems , some of which are not at all clear how to solve. This is largely due to the separation of the concepts of creating and starting a container, and the configuration is spread between these stages.

    We eventually settled on upstart. It is clear that soon you will still have to leave for systemd anyway, but it so happened that we use ubuntu of the version where upstart is by default by default. At the same time, we solved the issue of universal logging. Well, upstart also allows you to flexibly configure how to start a service restart, unlike docker's restart_always: true.

    upstart.unicorn.conf.j2
    description "Unicorn"
    start on filesystem or runlevel [2345]
    stop on runlevel [!2345]
    env HOME=/home/{{ run_user }}
    # change to match your deployment user
    setuid {{ run_user }}
    setgid team
    respawn
    respawn limit 3 30
    pre-start script
        . /etc/environment
        export HEXLET_VERSION
        /usr/bin/docker pull hexlet/hexlet-{{ rails_env }}:$HEXLET_VERSION
        /usr/bin/docker rm -f unicorn || true
    end script
    pre-stop script
        /usr/bin/docker rm -f unicorn || true
    end script
    script
      . /etc/environment
      export HEXLET_VERSION
      RUN_ARGS='--name unicorn' ~/apprunner.sh bundle exec unicorn_rails -p {{ unicorn_port }}
    end script
      



    The most interesting thing here is the service launch line:

    RUN_ARGS='--name unicorn' ~/apprunner.sh bundle exec unicorn_rails -p {{ unicorn_port }}
    


    This is done in order to be able to start the container from the server, without having to manually write all the parameters. For example, this way we can enter the rail console:

    RUN_ARGS=’-it’ ~./apprunner.sh bundle exec rails c


    apprunner.sh.j2
    #!/usr/bin/env bash
    . /etc/environment
    export HEXLET_VERSION
    ${RUN_ARGS:=''}
    COMMAND="/usr/bin/docker run --read-only --rm \    $RUN_ARGS \    -v /tmp:/tmp \    -v /var/tmp:/var/tmp \    -p {{ unicorn_port }}:{{ unicorn_port }} \    -e AWS_REGION={{ aws_region }} \    -e SECRET_KEY_BASE={{ secret_key_base }} \    -e DATABASE_URL={{ database_url }} \    -e RAILS_ENV={{ rails_env }} \    -e SMTP_USER_NAME={{ smtp_user_name }} \    -e SMTP_PASSWORD={{ smtp_password }} \    -e SMTP_ADDRESS={{ smtp_address }} \    -e SMTP_PORT={{ smtp_port }} \    -e SMTP_AUTHENTICATION={{ smtp_authentication }} \    -e DOCKER_IP={{ docker_ip }} \    -e STATSD_PORT={{ statsd_port }} \    -e DOCKER_HUB_USERNAME={{ docker_hub_username }} \    -e DOCKER_HUB_PASSWORD={{ docker_hub_password }} \    -e DOCKER_HUB_EMAIL={{ docker_hub_email }} \    -e DOCKER_EXERCISE_PREFIX={{ docker_exercise_prefix }} \    -e FACEBOOK_CLIENT_ID={{ facebook_client_id }} \    -e FACEBOOK_CLIENT_SECRET={{ facebook_client_secret }} \    -e HEXLET_IDE_VERSION={{ hexlet_ide_image_tag }} \    -e CDN_HOST={{ cdn_host }} \    -e REFILE_CACHE_DIR={{ refile_cache_dir }} \    -e CONTAINER_SERVER={{ container_server }} \    -e CONTAINER_PORT={{ container_port }} \    -e DOCKER_API_VERSION={{ docker_api_version }} \    hexlet/hexlet-{{ rails_env }}:$HEXLET_VERSION $@"
    eval $COMMAND
    



    There is one subtle point here. Unfortunately, the history of the teams is lost. To restore performance, you need to flip the corresponding files, but, frankly, we did not do this.

    By the way, here you can see another advantage of the docker: all external dependencies are indicated explicitly and in one place. If you are not familiar with this approach to configuration, then I recommend referring to this document from heroku.

    Dockerization


    Dockerfile



    Dockerfile
    FROM ruby:2.2.1
    RUN mkdir -p /usr/src/app
    WORKDIR /usr/src/app
    ENV RAILS_ENV production
    ENV REFILE_CACHE_DIR /var/tmp/uploads
    RUN curl -sL https://deb.nodesource.com/setup | bash -
    RUN apt-get update -qq \
      && apt-get install -yqq apt-transport-https libxslt-dev libxml2-dev nodejs imagemagick
    RUN echo deb https://get.docker.com/ubuntu docker main > /etc/apt/sources.list.d/docker.list \
      && apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 36A1D7869245C8950F966E92D8576A8BA88D21E9 \
      && apt-get update -qq \
      && apt-get install -qqy lxc-docker-1.5.0
    # bundle config build.rugged --use-system-libraries
    # bundle config build.nokogiri --use-system-libraries
    COPY Gemfile /usr/src/app/
    COPY Gemfile.lock /usr/src/app/
    COPY package.json /usr/src/app/
    # without development test
    RUN npm install
    RUN bundle install --without development test
    COPY . /usr/src/app
    RUN ./node_modules/gulp/bin/gulp.js webpack_production
    RUN bin/rake assets:precompile
    VOLUME /usr/src/app/tmp
    VOLUME /var/folders
    



    The first line shows that we do not need to worry about installing ruby, we just indicate the version that we want to use (and for which there is an image, of course).

    Containers are launched with the --read-only flag, which allows you to control writing to disk. Practice shows that they try to write everything in a row, in completely unexpected places. Below you can see that we created volume / var / folders, it writes rub when creating a temporary directory. But we throw some sections outside, for example / var / tmp, to fumble data between different versions. This is optional, but just saves us resources.

    Also inside we put the docker in order to control the docker from the docker. This is just for managing images with practice.

    Further, in just four lines, we describe everything that makes capistrano as a means of building the application.

    Image hosting


    You can raise your own docker distribution (former registry), but we are quite happy with the docker hub, for which we pay $ 7 per month and get 5 private repositories. He, of course, is far from perfect, both in terms of usability and capabilities. And sometimes the assembly of images instead of 20 minutes is delayed by an hour. In general, you can live, although there are alternative cloud solutions.

    Assembly and Deploy


    The way you build the application differs depending on your deployment environment.

    At staging, we use automated build, which is built as soon as it sees changes in the staging branch.



    As soon as the image is assembled, docker hub via webhook notifies zapier , which, in turn, sends information to Slack. Unfortunately, the docker hub does not know how to work directly with Slack (and the developers do not plan to support it).

    Deployment of staging is performed by the team:

    ansible-playbookdeploy.yml-istaging.ini


    Here is how we see it in slack:



    Unlike staging, a production image is not automatically collected. At the time of readiness, it builds up by manual launch on a special build server. With us, this server simultaneously serves as a bastion .

    Another difference is the active use of tags. If we always have the latest in staging, then here during the assembly we explicitly specify the tag (it is the version).

    The build starts like this:

    ansible-playbook build.yml -i production.ini -e ‘hexlet_image_tag=v100’
    


    build.yml
    - hosts: bastions
      gather_facts: no
      vars:
        clone_dir: /var/tmp/hexlet
      tasks:
        - git:
            repo: git@github.com:Hexlet/hexlet.git
            dest: '{{ clone_dir }}'
            accept_hostkey: yes
            key_file: /home/{{ run_user }}/.ssh/deploy_rsa
          become: yes
          become_user: '{{ run_user }}'
        - shell: 'cd {{ clone_dir }} && docker build -t hexlet/hexlet-production:{{ hexlet_image_tag }} .'
          become: yes
          become_user: '{{ run_user }}'
        - shell: 'docker push hexlet/hexlet-production:{{ hexlet_image_tag }}'
          become: yes
          become_user: '{{ run_user }}'
    



    The production deployment is performed by the command:

    ansible-playbook deploy.yml -i production.ini -e ‘hexlet_image_tag=v100’
    


    deploy.yml
    - hosts: localhost
      gather_facts: no
      tasks:
      - local_action:
          module: slack
          domain: hexlet.slack.com
          token: {{ slack_token }}
          msg: "deploy started: {{ rails_env }}:{{ hexlet_image_tag }}"
          channel: "#operation"
          username: "{{ ansible_ssh_user }}"
    - hosts: appservers
      gather_facts: no
      tasks:
        - shell: docker pull hexlet/hexlet-{{ rails_env }}:{{ hexlet_image_tag }}
          become: yes
          become_user: '{{ run_user }}'
        - name: update hexlet version
          become: yes
          lineinfile:
            regexp: "HEXLET_VERSION"
            line: "HEXLET_VERSION={{ hexlet_image_tag }}"
            dest: /etc/environment
            backup: yes
            state: present
    - hosts: jobservers
      gather_facts: no
      tasks:
        - become: yes
          become_user: '{{ run_user }}'
          run_once: yes
          delegate_to: '{{ migration_server }}'
          shell: >
            docker run --rm
            -e 'SECRET_KEY_BASE={{ secret_key_base }}'
            -e 'DATABASE_URL={{ database_url }}'
            -e 'RAILS_ENV={{ rails_env }}'
            hexlet/hexlet-{{ rails_env }}:{{ hexlet_image_tag }}
            rake db:migrate
    - hosts: webservers
      gather_facts: no
      tasks:
        - service: name=nginx state=running
          become: yes
          tags: nginx
        - service: name=unicorn state=restarted
          become: yes
          tags: [unicorn, app]
    - hosts: jobservers
      gather_facts: no
      tasks:
        - service: name=activejob state=restarted
          become: yes
          tags: [activejob, app]
    - hosts: localhost
      gather_facts: no
      tasks:
      - name: "Send deploy hook to honeybadger"
        local_action: shell cd .. && bundle exec honeybadger deploy --environment={{ rails_env }}
      - local_action:
          module: slack
          domain: hexlet.slack.com
          token: {{ slack_token }}
          msg: "deploy completed ({{ rails_env }})"
          channel: "#operation"
          username: "{{ ansible_ssh_user }}"
          # link_names: 0
          # parse: 'none'
    



    In general, the deployment itself is downloading the necessary images to the server, performing migrations, and restarting the services. Suddenly it turned out that the whole kapistran was replaced by a dozen lines of straightforward code. And at the same time, a dozen gems of integration with kapistran, suddenly, were simply not needed. The tasks that they performed most often turn into one task for ansible.

    Development


    The first thing you have to give up while working with the docker is from developing in Mac OS. For normal operation, you need Vagrant. To configure the environment, we have written a special playbook vagrant.yml. For example, in it we install and configure the database, although in production we use RDS.

    Unfortunately (and maybe fortunately), we still could not configure the normal workflow of development through the docker. Too many compromises and difficulties. At the same time, services like postgresql, redis and the like, we still run through it, even during development. And all this stuff continues to be managed through upstart.

    Monitoring


    Of the interesting things, we set up a Google cadvisor, which, in turn, sent the collected data to influxdb. Periodically, the cadvisor began to eat some wild amount of memory and had to restart it with his hands. And then it turned out that influxdb is good, but the alert on top of it simply does not exist. All this led to the fact that we abandoned any homemade. Now we have a datadog spinning with the corresponding plugins connected, and we are very satisfied.

    Problems


    After switching to docker, I immediately had to abandon quick fixes. Assembling an image can take up to 1 hour. And this pushes you to a more correct flow, to the ability to quickly and painlessly roll back to the previous version.

    Sometimes we come across bugs in the docker itself (more often than we would like), for example, right now we can’t switch to 1.6.2 from 1.5 because they still have a few not closed tickets with problems that many people come across.

    Total


    The changeable state of the server when deploying software is a pain point in any configuration system. Docker takes over most of this work, which allows the servers to be in a very clean state for a long time, and we don’t have to worry about transition periods. Changing the version of the same ruby ​​has become not only a simple task, but also completely independent of the administrator. And the unification of launch, deployment, deployment, assembly and operation allows us to spend much less time on system maintenance. Yes, aws certainly helps us a lot, but that does not negate the ease of use of docker / ansible.

    Plans


    The next step we want to implement continuous delivery and completely abandon staging. The idea is that the rollout will first be conducted on production servers accessible only from within the company.

    PS
    Well, for those who are not familiar with ansible, yesterday we released a basic course .

    Also popular now: