Puppet. Part 1: Introduction to Hiera

  • Tutorial

This article is the first of three articles in which I want to give my vision of the problem of managing large infrastructures using Puppet. The first part is an introduction to the powerful hierarchy organization tool Puppet Hiera. This article is aimed at people who are already familiar with Pappet, but not yet familiar with Hiera. In it, I will try to give basic knowledge about this powerful tool and how it facilitates the management of a large number of servers.

You probably know or imagine that managing a large infrastructure with Puppet- not an easy task. If for ten servers Pappet is not needed, for fifty just right you can write code as you like, then when it comes to 500+ servers, then in this case you have to seriously think about optimizing your efforts. It is bad that Pappet initially did not seem to think of a solution for large infrastructures, at least the hierarchy in it was initially laid out very badly. Standard node definitions are completely inapplicable in large companies. Node inheritance (as well as class inheritance ) is not recommended for Puppetlabs at all, instead it is better to load hierarchy data from external sources such as Hiera and External Node Classifier (ENC).
Despite the fact that initially the concept of ENC is not much different from Hiera, nevertheless, I do not really like specific ENC implementations, such as Puppet Dashboard and Foreman . I will explain why:

1) Data about my infrastructure is located somewhere in the application database. How to get them out of there if the application crashes? I dont know. I can speculate, but I don’t know for sure.
2) Powerful ENCs are poorly and difficult to scale due to their power. In contrast, Hiera stores all of his data in text form. Text data is very easy to sync via git and r10kbetween several Pappet masters, if such a need arises. In general, textual configurations are a UNIX way, no matter how old-fashioned it sounds.

Again, I do not reject the potential of Puppet Dashboard and Foreman as a monitoring and reporting tool. A beautiful web interface with graphs and pictures is necessary, but only as a means of viewing, not as a means of changing the configuration of your infrastructure. And I also know that Foreman can do a lot of things besides Pappet (the Red Hat Satellite Server 6 and the Foreman-based Katello project are prime examples). But still, I like the storage location of the configuration of my entire Hiera infrastructure more.

What is Hiera?This is the Ruby library, which is included by default in Pappet and helps to better organize your data in Pappet. Is it possible to do without it? Can. You can write all the numbers and parameters in the manifests, but then from a certain stage of development they will become completely terrifying, and it will become increasingly difficult for you to remember where what is stored and what it is responsible for.

What is the profit of using Hiera?You begin to separate the specific parameters of your infrastructure (user uids, ssh keys, dns settings, various centralized files, etc.) from the Pappet code that actually applies them to your infrastructure. This leads to the fact that if one day you need to find out what UID of such a user on such a server or even a group of servers, you will immediately know clearly where this information is stored, and you won’t frantically scroll through all your manifest in search of the right user and try to predict what the change in the UID will lead to "here." Of course, one should not expect a miracle from Hiera. After all, this is just a way to store and organize your data.

But pretty lyricslet's get down to business. Hiera (from hierarchy) operates on a hierarchy. And I wrote the following hierarchy in /etc/puppet/hiera.yaml:
:hierarchy:
    - "%{::environment}/nodes/%{::fqdn}"
    - "%{::environment}/roles"
    - "%{::environment}/%{::environment}"
    - common
:backends:
    - yaml
:yaml:
    :datadir: '/etc/puppet/hiera'

Remember this hierarchy, in the future I will actively use it.
For those who are not very familiar with Hiera, I’ll explain. We set the "/ etc / puppet / hiera" folder as the Hiera data store. Files in this folder are necessary must have the extension .yaml and data format of the YAML . Next, we set the file names that Hiera will expect to see in her folder. Since Hiera is called from Pappet's code, the same variables are available to her as Pappet, including facts . A built-in fact of each node is its environment, which can be used in Hiera as the variable % {:: environment} . The FQDN of a node in Hiera predictably looks like % {:: fqdn}. Thus, this hierarchy corresponds to a similar file structure:

/ etc / puppet / hiera /
| - common.yaml
| - production /
| ----- production.yaml
| ----- roles.yaml
| ---- - nodes /
| -------- prod-node1.yaml
| -------- prod-node2.yaml
| - development /
| ----- development.yaml
| ---- - roles.yaml
| ----- nodes /
| -------- dev-node1.yaml
| -------- dev-node2.yaml


Order of levels in hiera.yaml (not in file structure) is important. Hiera starts browsing from top to bottom, and then it all depends on the method of calling Hiera, which you use in the Pappet manifest. There are three methods, I will demonstrate them with an example. Let our hierarchy be described by the hiera.yaml file described above, create three files with the following contents:
/etc/puppet/hiera/common.yaml
classes:
  - common_class1
  - common_class2
roles:
  common_role1:
    key1: value1
    key2: value2
common: common_value

/etc/puppet/hiera/production/production.yaml
classes:
  - production_class1
  - production_class2
roles:
  production_role1:
    key1: value1
    key2: value2
production: production_value

/etc/puppet/hiera/production/nodes/testnode.yaml
classes:
  - node_class1
  - node_class2
roles:
  node_role1:
    key1: value1
    key2: value2
node: node_value

Hiera supports command line queries. In fact, the easiest way to understand the principle of its operation is from the console. Hiera by default keeps its config in /etc/hiera.yaml. You need to make this file a symlink to /etc/puppet/hiera.yaml. After that, we make a simple call:
[root@testnode]# hiera classes
["common_class1", "common_class2"]
Since in this request we did not provide information about the environment, and Hiera's fqdn takes data from the lowest level of the hierarchy - the common.yaml file. Array elements are displayed in square brackets. Let's try to provide data about the environment:
[root@testnode]# hiera classes ::environment=production
["production_class1", "production_class2"]
[root@testnode]# hiera classes ::environment=production ::fqdn=testnode
["node_class1", "node_class2"]
The data from production.yaml is higher in the hierarchy, therefore they are more priority and overwrite the data received from common.yaml. Similarly, data from testnode.yaml overwrites data from production.yaml. However, if there is no data in the parent hierarchy, then the data is logically taken from the subordinate hierarchies:
[root@testnode]# hiera common ::environment=production
common_value
[root@testnode]# hiera production ::environment=production ::fqdn=testnode
production_value
In this case, strings are returned, not arrays, according to the files above.
This type of request is called priority lookup . As you see, it always returns the first value found in the hierarchy (with the highest priority), and then ends without examining the underlying hierarchies. In Pappet, the standard hiera () function corresponds to it. In our example, this would be a call to hiera ('classes'). Since Pappet always calls Hiera out of context, we don’t need to specify anything else in the query line.

The next type of request is Array merge . We look:
[root@testnode]# hiera --array classes
["common_class1", "common_class2"]
[root@testnode]# hiera --array classes ::environment=production
["production_class1", "production_class2", "common_class1", "common_class2"]
[root@testnode]# hiera --array classes ::environment=production ::fqdn=testnode
["node_class1", "node_class2", "production_class1", "production_class2", "common_class1", "common_class2"]
This type of query passes through all levels of the hierarchy and collects all the values ​​found (strings and arrays) into one large single array. In Pappet's terminology, this query is called hiera_array (). However, this type of request is not able to collect hashes. If during his passage he encounters a hash, he will throw an error:
[root@testnode]# hiera --array roles
/usr/share/ruby/vendor_ruby/hiera/backend/yaml_backend.rb:38:in `block in lookup': Hiera type mismatch: expected Array and got Hash (Exception)
In a similar situation, the priority lookup will go fine and return a hash (in curly brackets):
[root@testnode]# hiera roles
{"common_role1"=>{"key1"=>"value1", "key2"=>"value2"}}

What if we need to collect hashes? We use the third type of request: Hash merge :
[root@testnode]# hiera --hash roles
{"common_role1"=>{"key1"=>"value1", "key2"=>"value2"}}
[root@testnode]# hiera --hash roles ::environment=production
{"common_role1"=>{"key1"=>"value1", "key2"=>"value2"},
 "production_role1"=>{"key1"=>"value1", "key2"=>"value2"}}
[root@testnode]# hiera --hash roles ::environment=production  ::fqdn=testnode
{"common_role1"=>{"key1"=>"value1", "key2"=>"value2"},
 "production_role1"=>{"key1"=>"value1", "key2"=>"value2"},
 "node_role1"=>{"key1"=>"value1", "key2"=>"value2"}}
This request, similar to the previous one, goes through all levels of the hierarchy and collects all hashes into one large common hash. It is easy to guess that when you try to collect arrays or strings with it, it will return an error:
[root@testnode]# hiera --hash classes
/usr/share/ruby/vendor_ruby/hiera/backend/yaml_backend.rb:42:in `block in lookup': Hiera type mismatch: expected Hash and got Array (Exception)
In Pappet, this request is called hiera_hash (). What happens if the same hash has different sets of “key => value” at different levels of the hierarchy? For example, user test at common level has UID = 100, and at node level testnode has UID = 200? In this case, for each specific key, the hash lookup will behave like a priority lookup, that is, return a higher priority value. Read more about this here .

Okay, cool ( well or not ) , but why do we need all this?
Pappet automatically (in versions 3.x does not even need to configure anything for this) scans Hiera for parameters that can be used by him.
To begin with, a simple slightly modified example from Pappet's site (by the way, the example now shows the obsolete ntp :: autoupdate and ntp :: enable parameters, I have their current names below). We will torment the long-suffering puppetlabs-ntp module . Suppose we want to express the following ntp configuration in Pappet:
/etc/ntp.conf
tinker panic 0
restrict restrict default kod nomodify notrap nopeer noquery
restrict restrict -6 default kod nomodify notrap nopeer noquery
restrict restrict 127.0.0.1
restrict restrict -6 :: 1
server 0.pool.ntp.org iburst burst
server 1.pool.ntp. org iburst burst
server 2.pool.ntp.org iburst burst
server 3.pool.ntp.org iburst burst
driftfile / var / lib / ntp / drift

To do this, add the following lines to common.yaml in Hier:
classes:
  - ntp
ntp::restrict:
  - restrict default kod nomodify notrap nopeer noquery
  - restrict -6 default kod nomodify notrap nopeer noquery
  - restrict 127.0.0.1
  - restrict -6 ::1
ntp::service_ensure: running
ntp::service_enable: true
ntp::servers:
  - 0.pool.ntp.org iburst burst
  - 1.pool.ntp.org iburst burst
  - 2.pool.ntp.org iburst burst
  - 3.pool.ntp.org iburst burst
It is easy to see that the specific values ​​of the ntp class variables that will be passed to the class when it is called are simply listed here. These variables are declared in the header of the ntp class (file modules / ntp / manifests / init.pp). With this method of passing parameters to a class from Hiera, you must use fully qualified variable names so that Pappet correctly loads them into the desired scope (scope).
The only thing left to do is add one line to the main Pappet manifest of your environment (site.pp):
hiera_include('classes')
This line, despite its simplicity and brevity, does a lot of work behind the scenes. First, Pappet walks through all (!) Hiera’s hierarchies and loads all classes declared in all sections of Hiera’s classes: Pappet then goes through all the fully qualified variables in Hiera and loads them into the scope of the corresponding class. It is easy to guess that if you remove the ntp class from the classes list, but forget to remove the variables of this class in the YAML file, then Pappet will throw an error like “cannot find declared class ntp”. Without a loaded class, its variables lose all meaning.
Here I have to say that the word classes (like everyone else) in Hiera’s YAML files does not carry any special or reserved meaning. Instead of classes, you can write any other word, for example production_classes, my_classes, my -% {:: environment}. Yes, the latter is also true, Pappet variables can also be used in the names of Hiera sections and hash keys . You cannot use variables in hash values, as well as in string variables and arrays, and sometimes it's a pity!

Thus, we effectively removed the ntp service parameters from Pappet's manifest in the Hiera hierarchy. Now, in accordance with the hierarchy described at the beginning of the article, these ntp parameters will be applied to absolutely all nodes in your infrastructure. But if you want to redefine these parameters at a higher level of the environment or at the level of a specific server - you can easily do this by indicating the values ​​of variables you need at the level of hierarchy that you need.

In fact, this way to automatically import data from Hiera to Pappet is not the only one.
Hidden text
image

The previous method has one significant drawback: it is too automatic. If on simple configurations we can easily predict its behavior, then in the case of a large number of hosts it is not always possible to say with certainty what the result of adding another class to the list of imported ones. For example, you can use the puppetlabs-apache module to add a specific Apache configuration to some nodes. If you include a harmless phrase
classes:
  - apache
to the production.yaml file , this will lead to the installation, configuration and launch of Apache on all production hosts. Moreover, the apache module will erase the entire previous Apache configuration that was already configured before it.
Hidden text
image

Here is such a fun default behavior ! So a simple 'include apache' can sometimes be expensive if you don't read the documentation.

But what to do ?! Enter apache in YAML only the nodes we need? Somehow it doesn’t work out quite centrally ...
In order to have a choice of what we want to include and what we don’t want, Pappet created the create_resources () function . Its application is beautifully described here .
Function create_resources (resource, hash1, hash2) : creates the resource resource Pappet , passing it the input hash1 and hash2. Hash2 is optional, but if specified, its keys and values ​​will be added to hash1. If the same parameter is specified in both hash1 and hash2, then hash1 is more priority. The resource can be either from the list of standard ones (see Puppet type reference ), or previously declared ( defined type ) by us or in the class. An example of a standard resource is a user resource, an example of a declared one is apache :: vhost from the apache module. Consider the Apache example (here I will allow myself to copy-paste a good example from the above link ).

Suppose we want to transfer the following configuration of two Apache virtual hosts to Hiera:
apache::vhost { 'foo.example.com':
      port          => '80',
      docroot       => '/var/www/foo.example.com',
      docroot_owner => 'foo',
      docroot_group => 'foo',
      options       => ['Indexes','FollowSymLinks','MultiViews'],
      proxy_pass    => [ { 'path' => '/a', 'url' => 'http://backend-a/' } ],
}
apache::vhost { 'bar.example.com':
    port     => '80,
    docroot: => '/var/www/bar.example.com',
}

In Hiera, it will look like this:
apache::vhosts:
  foo.example.com:
    port: 80
    docroot: /var/www/foo.example.com
    docroot_owner: foo
    docroot_group: foo
    options:
      - Indexes
      - FollowSymLinks
      - MultiViews
    proxy_pass:
      -
        path: '/a'
        url: 'http://localhost:8080/a'
  bar.example.com:
    port: 80
    docroot: /var/www/bar.example.com
All that remains to be written in the Pappet manifest is:
$myvhosts = hiera('apache::vhosts', {})
create_resources('apache::vhost', $myvhosts)
Here in the first line we asked Hiera to download the entire configuration from the apache :: vhosts section. The information was loaded in the form of two hashes: 'foo.example.com' and 'bar.example.com' (to be exact, an unnamed hash consisting of two named hashes fell into the $ myvhosts variable). After that, these hashes were transferred in turn to the apache :: vhosts resource, which will lead to their creation by Pappet.

Another good example of how to transfer data from manifests to Hiera. User management. If you write the following code in Hiera:
Hidden text
users:
  user1:
     ensure: present
     home: /home/user1
     shell: /bin/sh
     uid: 10001
     managehome: true
  user2:
     ensure: present
     home: /home/user2
     shell: /bin/sh
     uid: 10002
     groups:
       - secondary_group1
       - secondary_group2
  user3:
     ensure: present
     home: /home/user3
     shell: /bin/sh
     uid: 10003
     groups:
       - secondary_group3
       - secondary_group4

And then in site.pp write:
$node_users = hiera_hash('users')
create_resources(user, $users, {})
this will lead to the creation of all of the above users. Note that calling hiera_hash will efficiently collect all the users declared in the users: section from your entire hierarchy. If conflicts arise somewhere (different user UID in different files), Hiera will take the value described in a higher level of the hierarchy. Is logical.

Also, create_resources () along with defined types is one way to organize an iteration over a loop in Pappet, which is initially devoid of this function (at least without future parser, are you not so crazy to use it yet?). Both iteration methods are well described here .

That's all for a start. I gave the basics of using Hiera. Using the standard functions of Pappet, hiera (), hiera_array (), hiera_hash (), hiera_include () and create_resources (), as you probably already guessed, you can come up with a lot of things.
In the next article I will try to describe the management of server roles using Pappet and Hiera.

Also popular now: