Puppet + Hiera. Squeeze the maximum

In this article, I would like to talk about how we use Puppet and Hiera to configure the iron and virtual servers. Basically, it will talk about the architecture and the hierarchy we have invented, which facilitates and systematizes the configuration of servers.

Write this article led me to the fact that on the Internet, I especially did not find good, really working examples of how to work with hiera and why it is needed. Basically, these are tutorials with examples in order to enter the topic. But the real practical application of hiera is not written there. I may have been looking badly, but here’s a real example for you, which may help you to dot the i, as I once did.

Who would be useful for this article?

If a:

You know what Puppet and Hiera are, but don't particularly use them in a bundle, because it is not clear how to do it and why
You have a lot of commands in your company and you need to somehow differentiate the server configuration at the command level.
You are using a pappet, and your node files have grown to incredible sizes.
You like to read server configuration in divine yaml format :)
You are basically interested in Configuration Management and System Administration.

This article is for you.

Before you start

I'll warn you right away, the article turned out to be long, but, I hope, useful. In addition, it is assumed that you already have a hiera plugged into the puppet, and you are somehow familiar with puppet. If hiera is not connected, it is not difficult to do.

You can read more about what Puppet is here: Puppet for beginners .
How does Hiera - Introducing in Hiera .

Input data

We have about 30 development teams in SEMrush, each of which has its own servers
Each team works with its own set of technologies (PL, DBMS, etc.)
Teams can and should (ideally) use a common configuration for some specific projects (code reuse)
The teams themselves manage the deployment of applications to their servers (this is not done via pappet)

A bit of history

Initially, we had everything in the version 3 papette, then we decided to implement the 4th pappet, and all the new servers began to be placed in it, and the old ones were gradually ported to the 4th one.

In the third folder, we used to use the classic system of node files and modules. The modules were created in a special group of projects in Gitlab, they were cloned onto a web server (using r10k ), then agents came to the master and got a directory to apply it on the server.

Then they began to try not to do so and not to use local modules, but to place links to the necessary modules and their repositories in Puppetfile. Why? Because those modules are constantly maintained and improved (well, ideally) by the community and developers, and our local ones are not. Later they implemented hiera and switched to it completely, and node files (such as nodes.pp) have sunk into oblivion.

In the fourth folder we tried to completely abandon the local modules and use only remote modules. Unfortunately, a reservation should be inserted here again, since “completely” did not work out, sometimes you still have to clone something and finish it yourself. Of course, there is only hiera and no node files.

When you have 30 teams with technology zoo, the problem of how to maintain this zoo with> 1000 servers becomes especially acute. Then I will tell you how hiera helps us in this.

Hierarchy

The hiera (actually, from which it got its name) is configured hierarchy. We have it as follows:

---
:hierarchy:
 - "nodes/%{::fqdn}"
 - "teams/%{::team}_team/nodes/%{::fqdn}"
 - "teams/%{::team}_team/projects/%{::project}/tiers/%{::tier}"
 - "teams/%{::team}_team/projects/%{::project}/%{::role}"
 - "teams/%{::team}_team/projects/%{::project}"
 - "teams/%{::team}_team/roles/%{::role}"
 - "teams/%{::team}_team/%{::team}"
 - "projects/%{::project}/tiers/%{::tier}/%{::role}"
 - "projects/%{::project}/tiers/%{::tier}"
 - "projects/%{::project}/%{::role}"
 - "projects/%{::project}"
 - "tiers/%{::tier}"
 - "virtual/%{::virtual}"
 - "os/%{::operatingsystem}/%{::operatingsystemmajrelease}"
 - "os/%{::operatingsystem}"
 - users
 - common

First, let's deal with incomprehensible variables (facts).

Each server in SEMrush should ideally have 4 exposed, special facts describing its membership:

fact team- to which team he belongs
fact project- to which project it belongs
fact role- what role in this project has
fact tier- what is staging at it (prod, test, dev)

How it works? Pappet agent comes to the pappet master and, on the basis of these facts, searches for files for himself, walking through folders in accordance with our hierarchy. No need to specify the ownership of the configuration files to the servers. Instead, the servers themselves know which files belong to them, looking only at their path and their facts.

During the server setup, admins contact the developers and refine these parameters (often the other way around, knowledgeable people contact the admins themselves) in order to build a hierarchy in hiera, on the basis of which then describe the server configuration. Such a system helps reuse code and be more flexible in terms of server configuration.

For example, we have a special project . In this project there can be some front-end server with nginx, a backend server with python, a db-cluster with mysql, a redis server for caching. All of these servers should be placed in one project called special , and then assigned to the role servers .

In the project file, we describe the parameters common to the entire project. The first thing that comes to mind is the creation on all servers of the user for deployment with the issuance of the necessary rights and the rolling of its ssh-keys.

In the role for each server, the service is usually described and customized — what is the purpose of this server (nginx, python, mysql, etc.) Tier, in this case, we definitely need it if we also need to deploy a copy of the production environment on the dev platform, but change something in it (passwords, for example). In this case, the dev servers and the prod servers will only differ by the fact that the tier is set to the desired “position” (prod or dev). And then a little magic and hiera will do its job.

If we need to deploy two identical servers in the same role, but something in them should be different, for example, some lines in the configuration, then another part of the hierarchy will come to the rescue. We place the files with the format name {fqdn сервера}.yamlin the right place (for example, nodes/myserver.domain.net), set the necessary values of variables at the level of a specific server, and the pappet will apply to both servers the same configuration for the role, and unique for each of the servers.

Example: two backends with php-code are in the same role and completely identical. It is clear that we do not want to backup both servers - there is no point. We can create a role in which to describe the same configuration for both servers, and then create another file nodes/backend1.semrush.netin which to place the configuration for the backup.

The command file teams/team-name.yamlspecifies the configuration for all servers belonging to the command. Most often there are described users who can interact with these servers, as well as their access rights.

Based on these variables, we have built this hierarchy . The higher the found file by hierarchy, the higher the priority of the configuration specified in it.

From this it follows that variables can be overwritten based on this hierarchy. That is, the variable in the role file " projects/%{::project}/%{::role}" has a higher priority than the variable in the project file " projects/%{::project}". Also, variables can merge at all levels of the hierarchy, if you have a module and / or profile / role written so that you can do it. By specifying the common part of the mysql config for all project servers, you can add special parts that have weight for this role to the same variable at other hierarchy levels (for the slave there will be an additional section in the config).

It turns out that the file of a particular node, located along the path " hieradata/nodes/%{::fqdn}" , has the highest priority . Next comes the node file, but at the command level. Below is a block describing other, more general facts:

 - "virtual/%{::virtual}"
 - "os/%{::operatingsystem}/%{::operatingsystemmajrelease}"
 - "os/%{::operatingsystem}"
 - users
 - common

Accordingly, in the file common.yamlwe have a configuration that just needs to come to all servers, the file users.yamldescribes all users (but not all of them are created on servers, of course), to the os/%{::operatingsystem}general configuration inherent to servers with a particular OS (fact is used ::operatingsystem) and so Further.

I think, looking at this hierarchy, it becomes all clear. Below I will consider an example of using such a hierarchy. But first you need to talk about profiles.

Profiles

An important point in configuring servers using modules is the use of profiles. They are located along the way site/profilesand are the entry points to the modules. Thanks to them, it is possible to fine-tune the mounted modules on the server and create the necessary resources.

Consider a simple example. There is a module that installs and configures redis. And we also want to set the sysctl parameter vm.overcommit_memoryto 1 when connecting this module , because here . Then we write a small profile that provides this functionality:

# standalone redis server classprofiles::db::redis (
  Hash $config = {}, 
  String $output_buffer_limit_slave = '256mb 64mb 60',
) {
  # https://redis.io/topics/faq#background-saving-fails-with-a-fork-error-under-linux-even-if-i-have-a-lot-of-free-ram
  sysctl { 'vm.overcommit_memory':
    ensure => present,
    value  => '1',
  }
  class { '::redis':
    * => $config,
  }
}

As mentioned above, the profiles are a tool to change / improve the behavior of the module, as well as reduce the number of configurations in the hiera. If you are using remote modules, you may often encounter the problem that "approved" modules often do not have the functionality you need, or have some bugs / flaws. Then, in principle, you can clone this module and fix / add functionality. But the right decision would be, if possible, to write a good profile that is able to “prepare” a module in the way you want. Below are a few examples of profiles, and you can better understand what they are for.

Hiding secrets in hiera

One of the important advantages of hiera compared to a “bare” papet is its ability to store sensitive data in configuration files in an encrypted form in the repository. Your passwords will be safe.

In short, you use the public key to encrypt the necessary information and place it with such a string in the hiera file. A private key is stored on the Pappet Wizard, which allows you to decrypt this data. More on this can be found on the project page .

On the client (work computer) the tool is set up simply, you can through gem install hiera-eyaml. Then, using the view command, eyaml encrypt --pkcs7-public-key=/path/to/public_key.pkcs7.pem -s 'hello'you can encrypt the data and paste it into a file with the eyaml extension or just yaml, depending on how you configure it, and then the appet will figure it out. It will turn out something like:

roles::postrgresql::password:'ENC[PKCS7,MIIBeQYJKoZIhvcNAQcDoIIBajCCAWYCAQAxggEhMIIBHQIBADAFMAACAQEwDQYJKoZIhvcNAQEBBQAEggEAbIz1ihQlThMWa9T+Lq194Y6QdElMD1XTev5y+VPSHtkPTu6Al6TJaSrXF+7phJIjue+NF4ZVtJCLkHxUR6nJJqks0fcGS1vF2+6mmM9cy69sIU1A3HqpOHZLuqHAc7jUqljYxpwWSIGOK6I2FygdAp5FfOTewqfcVVmXj97EJdcv3DKrbAlSrIMO2iZRYwQvyv+qnptnZ7pilR2veOCPW2UMm6zagDLutX9Ft5vERbdaiCiEfTOpVa9Qx0GqveNRVJLV/5lfcL5ajdNBJXkvKqDbx8d3ZBtEVAAqeKlw0LqzScgmCbWQx2kUzukX5LSxbTpT0Th984Vp1sl7iPk7UTA8BgkqhkiG9w0BBwEwHQYJYIZIAWUDBAEqBBCp5GcwidcEMA+0wjAMblkKgBCR/f9KGXUgLh3/Ok60OIT5]'

Or multi-line string:

roles::postgresql::password: >
    ENC[PKCS7,MIIBeQYJKoZIhvcNAQcDoIIBajCCAWYCAQAxggEhMIIBHQIBADAFMAACAQEw
    DQYJKoZIhvcNAQEBBQAEggEAbIz1ihQlThMWa9T+Lq194Y6QdElMD1XTev5y
    +VPSHtkPTu6Al6TJaSrXF+7phJIjue+NF4ZVtJCLkHxUR6nJJqks0fcGS1vF
    2+6mmM9cy69sIU1A3HqpOHZLuqHAc7jUqljYxpwWSIGOK6I2FygdAp5FfOTe
    wqfcVVmXj97EJdcv3DKrbAlSrIMO2iZRYwQvyv+qnptnZ7pilR2veOCPW2UM
    m6zagDLutX9Ft5vERbdaiCiEfTOpVa9Qx0GqveNRVJLV/5lfcL5ajdNBJXkv
    KqDbx8d3ZBtEVAAqeKlw0LqzScgmCbWQx2kUzukX5LSxbTpT0Th984Vp1sl7
    iPk7UTA8BgkqhkiG9w0BBwEwHQYJYIZIAWUDBAEqBBCp5GcwidcEMA+0wjAM
    blkKgBCR/f9KGXUgLh3/Ok60OIT5]

It seems that we have finished with the preparation, now we can consider an example.

Example on fingers

Spoiler : there will be a lot of configurations next, so those to whom this article was of purely theoretical interest can skip this section and go to the end.

Let's now look at an example of how to configure servers using hiera in puppet4. I will not publish the code of all profiles, because otherwise the post will be quite large. I will focus on the hiera hierarchy and configuration.

The task is this: we need to deploy:

Two identical database servers where postgresql is deployed
Two more servers - frontend with nginx
Fifth and Sixth Servers - python backends in docker
Everything is the same on the dev-environment, except for some server configuration

We will create our hierarchy in order and we will start with the project file.

Project

Create a project file projects/kicker.yaml. Let's put into it what is common to all servers: we need some repositories and deployment folders, as well as the user deploy.

---
classes:
  - apt::debian::semrush
files:"/srv/data":
    ensure:'directory'owner:'deploy'group:'www-data'mode:'0755''/srv/data/shared_temp':
    ensure:'directory'owner:'deploy'group:'www-data'mode:'0775'
user_management::present:
  - deploy

Db role

Create a role file for database servers projects/kicker/db.yaml. For now, let's do without dividing servers into environments:

---
classes:
  - profiles::db::postgresql
profiles::db::postgresql::globals:manage_package_repo:trueversion:'10'
profiles::db::postgresql::db_configs:'listen_addresses':
    value:'*' 
profiles::db::postgresql::databases:kicker: {}
profiles::db::postgresql::hba_rules:'local connect to kicker':
    type:'local'database:'kicker'user:'kicker'auth_method:'md5'order:'001''allow connect from 192.168.1.100':
    type:'host'database:'kicker'user:'kicker'auth_method:'md5'address:'192.168.1.100/32'order:'002'

Here we connect one profile, written for general use by all who want to install postgres on their servers. The profile is configurable and allows you to flexibly configure the module before using it.

For the most curious below, the code for this profile is below:

Profiles :: db :: postgresql profile

classprofiles::db::postgresql (
  Hash $globals = {},
  Hash $params = {},
  Hash $recovery = {},
  Hash[String, Hash[String, Variant[String, Boolean, Integer]]] $roles = {},
  Hash[String, Hash[String, Variant[String, Boolean]]] $db_configs = {},
  Hash[String, Hash[String, Variant[String, Boolean]]] $databases = {},
  Hash[String, String] $db_grants = {},
  Hash[String, Hash[String, String]] $extensions = {},
  Hash[String, String] $table_grants = {},
  Hash[String, Hash[String, String]] $hba_rules = {},
  Hash[String, String] $indent_rules = {},
  Optional[String] $role = undef, # 'master', 'slave'
  Optional[String] $master_host = undef,
  Optional[String] $replication_password = undef,
  Integer $master_port = 5432,
  String $replication_user = 'repl',
  String $trigger_file = '/tmp/pg_trigger.file',
){
  case $role {
    'slave': {
      $_params = {
        manage_recovery_conf => true,
      }
      if $globals['datadir'] {
        file { "${globals['datadir']}/recovery.done":
          ensure => absent,
        }
      }
      $_recovery = {
        'recovery config' => {
          standby_mode     => 'on',
          primary_conninfo => "host=${master_host} port=${master_port} user=${replication_user} password=${replication_password}",
          trigger_file     => $trigger_file,
        }
      }
      $_conf = {
        'hot_standby' => {
          value => 'on',
        },
      }
      file { $trigger_file:
        ensure => absent,
      }
    }
    'master': {
      $_conf = {
        'wal_level' => {
          value => 'replica',
        },
        'max_wal_senders' => {
          value => 5,
        },
        'wal_keep_segments' => {
          value => 32,
        },
      }
      file { $trigger_file:
        ensure => present,
      }
    }
    default: {
      $_params = {}
      $_recovery = {}
      $_conf = {}
    }
  }
  class { '::postgresql::globals':
    * => $globals,
  }
  class { '::postgresql::server':
    * => deep_merge($_params, $params),
  }
  create_resources('::postgresql::server::config_entry', deep_merge($_conf, $db_configs))
  create_resources('::postgresql::server::role', $roles)
  create_resources('::postgresql::server::database', $databases)
  create_resources('::postgresql::server::database_grant', $db_grants)
  create_resources('::postgresql::server::extension', $extensions)
  create_resources('::postgresql::server::table_grant', $table_grants)
  create_resources('::postgresql::server::pg_hba_rule', $hba_rules)
  create_resources('::postgresql::server::pg_indent_rule', $indent_rules)
  create_resources('::postgresql::server::recovery', deep_merge($_recovery, $recovery))
}

Thus, we install Postgresql 10, configure the config ( listen) in one fell swoop , create the database kicker, and also write pg_hba.conftwo rules for access to this database. Cool!

Role frontend

We undertake for frontend. Create a file with the projects/kicker/frontend.yamlfollowing content:

---
classes:
  - profiles::webserver::nginx
profiles::webserver::nginx::servers:
  'kicker.semrush.com':
    use_default_location: false
    listen_port: 80
    server_name:
      - 'kicker.semrush.com'
profiles::webserver::nginx::locations:
  'kicker-root':
    location: '/' 
    server: 'kicker.semrush.com'
    proxy: 'http://kicker-backend.semrush.com:8080'
    proxy_set_header:
      - 'X-Real-IP $remote_addr'
      - 'X-Forwarded-for $remote_addr'
      - 'Host kicker.semrush.com'
    location_cfg_append:
      'proxy_next_upstream': 'error timeout invalid_header http_500 http_502 http_503 http_504'
    proxy_connect_timeout: '5'

Everything is simple here. We connect the profile profiles::webserver::nginxthat prepares the entrance to the nginx module and define variables, specifically server, and locationfor the site.

The attentive reader will notice that it would be more correct to put the site description higher in the hierarchy, because we will have another dev-environment, and other variables ( server_name, proxy) will be used there , but this is not too important. By describing a role in this way, we can see how these variables are redefined just by hierarchy.

Role docker

Left role dockerprojects/kicker/docker.yaml:

---
classes:
  - profiles::docker
profiles::docker::params:version:'17.05.0~ce-0~debian-stretch'packages:'python3-pip':
    provider: apt
  'Fabric3':
    provider: pip3
    ensure:1.12.post1
user_management::users:deploy:groups:
      - docker

The profile is profiles/docker.ppvery simple and elegant. I will give his code:

Profile profiles :: docker

classprofiles::docker (
  Hash $params = {}, 
  Boolean $install_kernel = false,
){
  class { 'docker':
    * => $params,
  }
  if ($install_kernel) {
    include profiles::docker::kernel
  }
}

All is ready. This is already enough to deploy the product we need on multiple servers simply by assigning them a specific project and role (for example, putting the file in the required format in the facts.d directory, the location of which depends on how you install puppet).

Now we have the following file structure:

.
├── kicker
│   ├── db.yaml
│   ├── docker.yaml
│   └── frontend.yaml
└── kicker.yaml
1 directory, 4 files

We will now understand the environments and the configuration definition, which is unique for a role on a specific site.

Environment and override

Create a common configuration for the entire sale. The file projects/kicker/tiers/prod.yamlcontains an indication that we need to connect a class with a firewall to this environment (well, after all), as well as a certain class that provides an enhanced level of security:

---
classes:
  - semrush_firewall
  - strict_security_level

For the dev environment, if we need to describe something specific, the same file is created and the necessary configuration is entered into it.

Next, you still need to redefine the variables for the nginx-config role frontendin the dev environment. For this you will need to create a file projects/kicker/tiers/dev/frontend.yaml. Notice the new level of hierarchy.

---
profiles::webserver::nginx::servers:
  'kicker-dev.semrush.com':
    use_default_location: false
    listen_port: 80
    server_name:
      - 'kicker-dev.semrush.com'
profiles::webserver::nginx::locations:
  'kicker-root':
    location: '/' 
    server: ‘kicker-dev.semrush.com'
    proxy: 'http://kicker-backend-dev.semrush.com:8080'
    proxy_set_header:
      - 'X-Real-IP $remote_addr'
      - 'X-Forwarded-for $remote_addr'
      - 'Host kicker-dev.semrush.com'
    location_cfg_append:
      'proxy_next_upstream': 'error timeout invalid_header http_500 http_502 http_503 http_504'
    proxy_connect_timeout: '5'

Class is no longer necessary, it is inherited from previous levels of the hierarchy. Here we have changed server_nameand proxy_pass. A server that has the facts role = frontend and tier = dev will first find a file for itself projects/kicker/frontend.yaml, but then the variables from this file will be overridden by a file with a higher priority projects/kicker/tiers/dev/frontend.yaml.

Password Hiding for PostgreSQL

And so we have the last item on the agenda - to set passwords for PostgreSQL.

Passwords must be different in environments. We will use eyaml for safe storage of passwords. Create passwords:

eyaml encrypt -s 'verysecretpassword'
eyaml encrypt -s 'testpassword'

We paste the received lines into the files **projects/kicker/tiers/prod/db.yaml**and **projects/kicker/tiers/dev/db.yaml**(or you can use the eyaml extension, this is customizable), respectively. Here is an example:

---
profiles::db::postgresql::roles:'kicker':
    password_hash: >  'ENC[PKCS7,MIIBeQYJKoZIhvcNAQcDoIIBajCCAWYCAQAxggEhMIIBHQIBADAFMAACAQEwDQYJKoZIhvcNAQEBBQAEggEAsdpb2P0axUJzyWr2duRKAjh0WooGYUmoQ5gw0nO9Ym5ftv6uZXv25DRMKh7vsbzrrOR5/lLesx/pAVmcs2qbhd/y0Vr1oc2ohHlZBBKtCSEYwem5VN+kTMhWPvlt93x/S9ERoBp8LrrsIvicSYZByNfpS2DXCFbogSXCfEPxTTmCOtlOnxdjidIc9Q1vfAXv7FRQanYIspr2UytScm56H/ueeAc/8RYK51/nXDMtdPOiAP5VARioUKyTDSk8FqNvdUZRqA3cl+hA+xD5PiBHn5T09pnH8HyE/39q09gE0pXRe5+mOnU/4qfqFPc/EvAgAq5mVawlCR6c/cCKln5wJTA8BgkqhkiG9w0BBwEwHQYJYIZIAWUDBAEqBBDNKijGHBLPCth0sfwAjfl/gBAaPsfvzZQ/Umgjy1n+im0s]'

Next, the password for the role kickerwill arrive, decrypted and applied on the database server in PostgreSQL.

On this, in fact, that's all. Yes, the example turned out to be massive, but, I hope, functional, not leaving questions, understandable and useful. The final hierarchy in hiera was:

.
├── db.yaml
├── docker.yaml
├── frontend.yaml
└── tiers
    ├── dev
    │   ├── db.yaml
    │   └── frontend.yaml
    ├── prod
    │   └── db.yaml
    └── prod.yaml
3 directories, 7 files

You can view these files live by clipping a specially created repository.

Conclusion

Puppet is nice and handy in conjunction with hiera. I would not call it an ideal configuration tool in the modern world, not at all, but it deserves attention. He copes with some tasks very well, and his “philosophy” of constantly maintaining the same state of resources and configuration can play an important role in ensuring the security and uniformity of configurations.

The modern world is gradually synergizing and evolving. Few people now use only one configuration system, often in the arsenal of devops and admins several systems at once. And this is good, as there is plenty to choose from. The main thing is that everything is logical and clear how and where it can be configured.

Our goal as admins in the end is not to configure anything yourself. All this should ideally be done by the teams themselves. And we have to give them a tool or product that allows it to be done safely, easily and, most importantly, with an exact result. Well, and help solve architectural and more serious tasks than "You need to install PostgreSQL on the server and create a user." Kamon, 2018th year in the yard!~~So throw out puppet and ansible and move to a serverless future.~~

With the development of clouds, containerization and container orchestration systems, configuration management systems are slowly receding and receding into the background for users and customers. You can also raise a failover cluster of containers in the cloud and keep your applications in containers with auto-scaling, backup, replication, auto-discovery, etc., without writing a single line for ansible, puppet, chef etc. You don't need to take care of anything (well, almost). On the other hand, there are no fewer iron servers due to clouds. You just do not need to configure them anymore, this action is the responsibility of the cloud provider. But they are unlikely to use the same systems as mere mortals.

Credits

Thank:

Dmitry Tupitsin, Dmitry Loginov, Stepan Fedorov and the whole system administrators team for helping to prepare this article
Vladimir Legkostupov for the picture
Yana Tabakova for organizing this whole thing and helping to complete all the pre-publishing stages
Nikita Zakharov for his help in licensing

Tags: