Ensemble of salty cooks puppeteers: compare Ansible, SaltStack, Chef and Puppet

Today we will talk about what SCM is and tell you a few stories through the prism of which we look at Ansible, SaltStack, Chef and Puppet, choosing the best option for a specific task.

The material is based on the transcript of the report by Andrey Filatov , a leading system engineer at EPAM Systems, from our October conference DevOops 2017.

What is SCM and what does it eat?

What is SCM? First of all, this is a thing that allows our infrastructure from state A to execute some code and bring it to state B. Many developers who are not DevOps engineers in practice think that something happens in an “automagic” way. on infrastructure.

The “automagic” method is implemented by SCM (System Configuration Management). What is it for? First of all, in order to build repeatable and consistent infrastructures. SCM well expands CI / CD processes. Since this is a code, it can be stored in any version control system: Git, Mercurial. It is quite simple to develop and maintain.

The final is a closed loop of automation: everything can be done in automatic mode, from creating the infrastructure to its deployment and deployment code.

What is SCM: Ansible

Consider our applicants. The first is Ansible. It has an agentless architecture, if we are talking about an open source version, written in Python, has a Yaml-like DSL, which is easily extensible with modules written in Python, is very simple and lightweight. Ansible has the lowest threshold of entry - you can teach anyone.

There is an experience when a person, not knowing Python, not knowing anything about SCM, entered Ansible in just two days and already started to do something.
Below is an example of ChatOps: Slack notifier. The code for Ansible who saw Yaml is nothing new.

- block:
  - name: "SlackNotify : Deploy Start"
    local_action:
      module: slack
      token: "{{ slack_url }}"
      attachments:
        - title: "Deploy to {{ inventory_hostname }} has been Started"
          text: "<!here> :point_up_2:"
          color: "#551a8b"
  - include: configure.yml
    tags:
      - configure
  - include: remote-fetch.yml
    tags:
      - remote
  - include: assets.yml

What is SCM: Chef

Chef is a client-server architecture, there is a Chef-server and Chef-client. The configuration is based on search, written in Ruby, has Ruby DSL. Accordingly, within your cookbooks and recipes, you can use the full power of Ruby, but I do not advise doing this. Chef has a huge community and the largest collection of tools among all SCM. This is how the code on Chef looks like, this is Jetty deployment.

## Cookbook Name:: dg-app-edl# Recipe::fe#
node.normal[:jetty][:home] = "/usr/share/jetty"
node.normal[:jetty][:group] = "deploy"
include_recipe "dg-auth::deploy"
include_recipe "newrelic::repository"
include_recipe "newrelic::server-monitor"
include_recipe "dg-jetty::jetty9"
include_recipe "newrelic::java-agent"
directory "edl"do
  action :create
  owner
  group "deploy"
  mode "0775"
  path "/usr/share/where/edl"
  recursive trueend

What is SCM: SaltStack

SaltStack has both an agentless architecture that works in push mode using Salt-SSH, as well as a client-server architecture when there is a Salt-master and Salt-minion. The emphasis is on real-time automation, out of the box has a parallel execution of all processes and is written in Python. It is also a Yaml-like language, the code is very similar to Ansible.

#ntp-packages:
  pkg.installed:
    - pkgs:
      - ntp
      - ntpdate
#/etc/ntp.conf:
  file:
    - managed
    - source: salt://common/ntpd/ntp.conf
    - template: jinja
    - mode: 644#/etc/sysconfig/ntpd:
  file:
    - managed
    - source: salt://common/ntpd/ntpd
    - template: jinja
    - mode: 644#ntp-service:
  service.running:
    - name: ntpd

What is SCM: Puppet

The last of our challengers is Puppet. It also has a client-server architecture, like Chef, the configuration is not based on search, but on the “facts” that come with Puppet-master, written in Ruby, has a Ruby-like DSL. But the guys from Puppet do not allow the use of pure Ruby code in their manifests. This is a plus and a minus. This is what the Puppet manifest code looks like:

class { 'mysql::server' :
  root_password => 'password'
}
mysql::db{ ['test', 'test2', 'test3']:
  ensure => present,
  charset => 'utf8',
  require => Class['mysql::server'],
}
mysql::db{ 'test4':
  ensure => present,
  charset => 'latin1',
}
mysql::db{ 'test5':
  ensure => present,
  charset => 'binary',
  collate => 'binary',
}

SCM in practice

SaltStack in a Demilitarized Environment

First of all, I would like to share a project that was written in SaltStack. This is our previous project and the freshest pain, and fresh pain is always the most painful. Our customer is engaged in data storage - this is the production of iron servers for data storage on GPFS, GlusterFS, but custom assemblies. He came to us with the following tasks:

Create USB / DVD installer. You need to create a media from which everything is installed. This is done for customers of the customer who live in closed areas, where there is usually no Internet on the servers. We need to pack into one ISO, send it to field engineers, who will deploy everything they need on site.
Deploy a cluster with a product. Customers have several large products, we must be able to deploy them in a cluster mode.
Manage, configure and maintain the cluster using the CLI utility. Our framework should help field engineers manage the cluster.

The customer had several requirements. First of all, he has a huge amount of Python-expertise, in fact only C and Python-developers. The customer immediately said: “We want SaltStack”, leaving no choice.

What are we facing? The customer in the installation has several products, all must be Salt-Master'ami. But we are faced with the problem of scaling multi-master-configuration. For example, in our NODE Info (the state of a specific server) was selected with a two-master configuration of a millisecond, with three seconds already, and with five we never waited for the completion of the operation. MultiMaster is a good feature, but it scales poorly.

The second problem we are facing is teamwork: SaltStack has Runner and Module. Module is an extension that runs on Salt Minion, on the side of the machine. Runner is running on the server side. We very often had battles: what to do Runner, and what to do Modules.

Then came across a small surprise from cache.mine:

ime = ImeActions()
id = __grains__['id']
if id == ime.master_id:
    ret = __salt__['mine.get'](id, 'ime_actions.make_bfs_uuid')
    ime_dict = ret.get(id, None)
    ifnot ime_dict:
        try:
            result = __salt__['mine.send']('ime_actions.make_bfs_uuid')
        except Exeption, e:
            log.error("Failed to generate uuid: {0}.".format(str(e)))
            result = Falseelse:

We have a utility that is written in C. We run it, it generates a random ID. It should be unique among all cluster members, respectively, we need to do this once on the master, and then distribute it among the machines. We used cache.mine for this. As it turned out, he is not experiencing a reboot.

"Race condition". Parallelization is good, but in the basic configuration state.orchestrate comes to the state.sls is running if long processes occur. By timeout, he believes that State has already completed, although he is still running, and is trying to start the next one. An error occurs. And this problem has not yet been fixed.

You can look at github .

What could we use besides SaltStack?

SaltStack in a DMZ environment

DMZ. Chef packs well, Puppet too. And with Ansible problem - if there is no Tower, - there is no way to run the configuration in pull-mode from our nodes, what needs to be done in the demilitarized zone.
Framework for CLI (in Python). Chef and Puppet are not very suitable, but if you have no restrictions on using only Python, you can write in Ruby and use the Chef or Puppet APIs. Ansible does not support this tool.
Cluster Management. Chef is well suited for managing clusters, Puppet too, and Ansible was originally written to manage clusters on Amazon.

Chef in a large and dynamic environment

The customer came with the task of consolidating all resources in one cloud - it was Openstack. Before that, everything was scattered: something on the Rackspace Cloud, something on dedicated servers or its private data centers.

They wanted fully dynamic resource management, and also, so that their applications could, if necessary, add capacity to themselves. That is, we need a full dynamic infrastructure and a fully dynamic environment, both up and down.

In order to properly build a CD process, you need a fully automated environment. We created SDLC - Software Development Lifecycle for them, and applied it, including, to SCM. They pass integration tests not only for applications, but also for infrastructure.

Accordingly, when something goes wrong with us, we must, like the guys from Netflix, be able to kill defective resources and restore fresh and guaranteed workers to replace them.

What problems we faced:

It was 2013, used Chef 10, in which a slow search. We ran a search, bypassing all the machines, and it took forever. We tried to solve the problem of the naming-convention, as well as the choice and search for fqdn. This narrowed the search area, due to which it accelerated.

But some operations need to be done on the whole environment. Accordingly, the search was launched once at the very beginning, the result was saved in the attribute, and the results were filtered through Ruby: we parsed the pieces we needed and did what was needed.
```
if !Chef::Config[:solo]
  search(:node, "fqdn:*metro-#{node[:env]}-mongodb*").each do|mongo|
    @mongodbs << mongo.fqdn
  endelse
  @mongodbs = ["lvs-metro-#{node[:env]}-mongodb3001.qa.example.com"]
end
```
Bottom line: use naming conventions, run the search once, use Ruby to filter the desired results.
Using "node.save" is not safe, be careful and cautious. We ran into this problem when deploying MySQL clusters, and used inside the recipe node.save on a not fully configured MySQL node. And at the time of Scale-up, some applications gave out 500 errors. It turned out that we were not saving the node at that time: it goes to the Chef-server, then the Chef-client on the UI picks up a new node that has not been configured before the operating mode.
Absence of splay can kill the chef server. Splay is a Chef client parameter that allows you to specify a range when the client goes to the server for configuration. With a large load, when you need to deploy many nodes at the same time, this will allow you not to kill the server.

What can we use instead of Chef?

Dynamic provisioning. SaltStack is perfect because it has SaltCloud, which integrates perfectly wherever you go. Puppet has similar functionality, but it is available only in Puppet Enterprise, for money. Ansible is good if the company “lives” in Amazon, if something else is possible, you can tie it into alternatives, but this is not so convenient.
SDLC. Chef has it all, from Test Kitchen to choosing the tools for integration testing. SaltStack has all the available Python tools, now Puppet also has everything. Ansible has a Role Spec, you can use Test Kitchen from Chef, but this is not a native tool.
Resource replacement. In Chef, everything is fine, in SaltStack you can finish SaltCloud to the desired state, in Puppet tools are only in the Enterprise version, and Ansible works well only with Amazon.

EPAM Private Cloud with Chef

A year and a half before the advent of AWS OpsWorks, we wanted to create an advanced Amazon CloudFormation, integrating Chef, so that resources were not only deployed, but also tuned.

The second global task is to create a service catalog so that customers and users can deploy a fully ready for use LAMP stack using the CLI.

We chose Chef, but the project had to support different SCM. We started with the built-in Chef-Server, and users could also use their own Chef-Server, which is hosted somewhere around them. That is, we did not get access to user resources and laptops, but it still worked.

To implement CloudFormation + OpsWork, you can use any SCM, everyone fits. To create the catalog, everything except SaltStack will cope well with this. SaltStack has some nuances: it is extremely difficult to find a specialist who knows SaltStack well and can create a service and fill the catalog.

SCM's popularity in EPAM

This is the statistics of SCM popularity inside EPAM. SaltStack is very far behind. In the first place Ansible, it is the simplest and with the lowest threshold of entry. When we try to find someone on the market with SCM knowledge - the market looks about the same.

Work with Ansible

Tips that I can give when working with Ansible:

Use 'accelerate', it is 2-6 times faster than SSH expands configurations (for el6). For everyone else there is a 'pipelining'. It is turned off for backwards compatibility, but it’s very easy to turn back pipelining, I recommend it.

Use 'with_items'

- name: project apt dependencies installed
  apt:
    name: "{{ item }}"
  become: yes
  with_items:
    - build-essential
    - acl
    - git
    - curl
    - gnupg2
    - libpcre3-dev
    - python-apt
    - python-pycurl
    - python-boto
    - imagemagick
    - libmysqlclient-dev # needed for data import

In this example, we install packages, this scheme can be used to create users and similar operations.

Use 'local_action' and 'delegated' carefully. The first allows you to get something similar to SaltStack Runner, the second is able to delegate tasks to specific machines.

- name: create postgresql database
  postgresql_db:
    name: "{{ database_name }}"
    login_host: "{{ database_host }}"
    login_user: "{{ database_master_user }}"
    login_password: "{{ database_master_password }}"encoding: "UTF-8"
    lc_collate: "en_US.UTF-8"
    lc_ctype: "en_US.UTF-8"template: "template0"
    state: present
  delegate_to: "{{ groups.pg_servers|random}}"

This is a piece of database creation. Without the last line, the operation would have been executed several times and dropped in the second attempt to create the same database.

Optimize your roles and performance with tags. This can significantly reduce the execution time.

findings

For me personally, Ansible is a favorite. SaltStack is very good, very flexible, but requires knowledge of Python, without them it is better not to use SaltStack. Chef is a universal silver bullet for any tasks and any scale, but it requires more knowledge than Ansible. And who uses Puppet - I do not know. In principle, it is very similar to Chef, but with its own nuances.

Minute advertising. If you liked this report from the DevOops conference - note that the new DevOops 2018 will be held in St. Petersburg on October 14th, its program will also have a lot of interesting things. The site already has the first speakers and reports.

Tags: