distol April 24, 2018 at 13:04

Device and mechanism of work of Prometheus Operator in Kubernetes

This article is based on our internal documentation for DevOps engineers explaining how Prometheus works under the control of the Prometheus Operator in deployed and maintained Kubernetes clusters.

At first glance, Prometheus may seem like a rather complicated product, but, like any well-designed system, it consists of clearly expressed functional components and essentially does only three things: a) collects metrics, b) fulfills the rules, c) stores the result in the database time series data. The article is devoted not so much to Prometheus itself, but how to integrate this system with Kubernetes, for which we are actively using an auxiliary tool called Prometheus Operator . But you still need to start with Prometheus itself ...

Prometheus: what is he doing?

So, if we dwell in more detail on the first two functions of Prometheus, then they work as follows:

For each monitoring target (target) , each scrape_interval, an HTTP request is made to this target. In response, metrics are received in their own format , which are stored in the database.
Each evaluation_intervalprocessed rules (rules) , based on which:
- or alerts are sent,
- or new metrics are written (to oneself in the database) (the result of the rule).

Prometheus: how is it configured?

The Prometheus server has config and rule files .

The following sections are available in config :

scrape_configs - settings for searching for targets for monitoring (see the next section for details);
rule_files - a list of directories where the rules that need to be downloaded are:
```
rule_files:
- /etc/prometheus/rules/rules-0/*
- /etc/prometheus/rules/rules-1/*
```
alerting- Alertmanager 's search settings to which alerts are sent. The section is very similar to scrape_configswith the difference that the result of its work is a list of endpoints to which Prometheus will send alerts.

Prometheus: where does the goal list come from?

The general algorithm of Prometheus is as follows:

Prometheus reads the config section scrape_configs, according to which adjusts its internal service discovery mechanism (Service Discovery) .
The Service Discovery mechanism interacts with the Kubernetes API (mainly for receiving endpoints ).
Based on data from Kubernetes, the Service Discovery mechanism updates Targets (a list of goals).

A scrape_configslist of scrape jobs is indicated (this is the internal concept of Prometheus), each of which is defined as follows:

scrape_configs:
  # Общие настройки
- job_name: kube-prometheus/custom/0    # просто название scrape job'а
                                        # показывается в разделе Service Discovery
  scrape_interval: 30s                  # как часто собирать данные
  scrape_timeout: 10s                   # таймаут на запрос
  metrics_path: /metrics                # path, который запрашивать
  scheme: http                          # http или https
  # Настройки Service Discovery
  kubernetes_sd_configs:                # означает, что targets мы получаем из Kubernetes
  - api_server: null                    # использовать адрес API-сервера из переменных
                                        # окружения (которые есть в каждом поде)
    role: endpoints                     # targets брать из endpoints
    namespaces:
      names:                            # искать endpoints только в этих namespaces
      - foo
      - baz
  # Настройки "фильтрации" (какие enpoints брать, какие — нет) и "релейблинга"
  # (какие лейблы добавить или удалить — для всех получаемых метрик)
  relabel_configs:
  # Фильтр по значению лейбла prometheus_custom_target,
  # полученного из service, связанного с endpoint
  - source_labels: [__meta_kubernetes_service_label_prometheus_custom_target]
    regex: .+                           # подходит любой НЕ пустой лейбл
    action: keep
  # Фильтр по имени порта
  - source_labels: [__meta_kubernetes_endpoint_port_name]
    regex: http-metrics                 # подходит, если порт называется http-metrics
    action: keep
  # Добавляем лейбл job, используем значение лейбла prometheus_custom_target
  # у service, к которому добавляем префикс "custom-"
  #
  # Лейбл job — служебный в Prometheus. Он определяет название группы,
  # в которой будет показываться target на странице targets, а также он будет
  # у каждой метрики, полученной у этих targets (чтобы можно было удобно
  # фильтровать в rules и dashboards)
  - source_labels: [__meta_kubernetes_service_label_prometheus_custom_target]
    regex: (.*)
    target_label: job
    replacement: custom-$1
    action: replace
  # Добавляем лейбл namespace
  - source_labels: [__meta_kubernetes_namespace]
    regex: (.*)
    target_label: namespace
    replacement: $1
    action: replace
  # Добавляем лейбл service
  - source_labels: [__meta_kubernetes_service_name]
    regex: (.*)
    target_label: service
    replacement: $1
    action: replace
  # Добавляем лейбл instance (в нём будет имя пода)
  - source_labels: [__meta_kubernetes_pod_name]
    regex: (.*)
    target_label: instance
    replacement: $1
    action: replace

Thus, Prometheus itself tracks:

adding and removing hearths (when adding / removing hearths, Kubernetes changes endpoints, and Prometheus sees this and adds / removes goals);
adding and removing services (more precisely, endpoints) in the specified namespaces .

Changing the config is required in the following cases:

you need to add a new scrape config (usually this is a new kind of service that needs to be monitored);
need to change the namespace list.

Having dealt with the basics of Prometheus, let's move on to its “operator" - a special auxiliary component for Kubernetes that simplifies the deployment and operation of Prometheus in the realities of the cluster.

Prometheus Operator: what is it doing?

For the notorious "simplification", firstly, in the Prometheus Operator using the CRD ( Custom Resource Definitions ) mechanism, three resources are defined:

prometheus - defines the installation (cluster) of Prometheus;
servicemonitor - determines how to monitor a set of services (i.e. collect their metrics);
alertmanager - defines a cluster of Alertmanagers (we don’t use them, because we send metrics directly to our notification system, which receives, aggregates and ranks data from a variety of sources - including integration with Slack and Telegram).

Secondly, the operator monitors the resources prometheusand generates for each of them:

StatefulSet (with Prometheus itself);
Secret with prometheus.yaml(config Prometheus) and configmaps.json(config for prometheus-config-reloader).

Finally, the operator also monitors the resources servicemonitorand ConfigMaps with the rules, and on their basis updates the configs prometheus.yamland configmaps.json(they are kept secret).

What's up with Prometheus?

The sub consists of two containers:

prometheus - Prometheus itself;
prometheus-config-reloader- a binding that monitors changes prometheus.yamland, if necessary, causes reload of the Prometheus configuration (with a special HTTP request - see more details below), and also monitors ConfigMaps with rules (they are specified in configmaps.json- see more details below) and, if necessary, downloads them and restarts Prometheus.

Sub uses three volumes :

config- mounted secret (two files: prometheus.yamland configmaps.json). Connected to both containers;
rules- emptyDirwhich fills prometheus-config-reloaderand reads prometheus. Connected to both containers, but in prometheus- in read-only mode;
data- Prometheus data. Mounted only in prometheus.

How are Service Monitors handled?

The Prometheus Operator reads Service Monitors (and also monitors their addition / deletion / change). Which Service Monitors are specified in the resource itself prometheus(see the documentation for more details ).
For each Service Monitor , if it does not specify a specific list of namespaces (i.e. indicated any: true), the Prometheus Operator calculates (using the Kubernetes API) a list of namespaces in which there are Services that match the labels specified in Service Monitor .
Based on the read resources servicemonitor(see the documentation ) and on the basis of the calculated name spaces, the Prometheus Operator generates a part of the scrape_configsconfig (section ) and saves the config in the corresponding secret.
By regular means of Kubernetes itself, the data from the secret comes into the sub (the file is prometheus.yamlupdated).
Changing the file notices prometheus-config-reloaderthat HTTP sends a request to Prometheus to reboot.
Prometheus re-reads the config and sees the changes in scrape_configs, which it processes according to its logic of operation (see more details above).

How are ConfigMaps handled with rules?

Prometheus Operator monitors ConfigMaps matching the ruleSelectorspecified in the resource prometheus.
If a new (or existing) ConfigMap has appeared , the Prometheus Operator updates prometheus.yaml, and then the logic is triggered, which corresponds exactly to the Service Monitors processing (see above).
As in the case of adding / removing ConfigMap , and when changing the contents of ConfigMap , Prometheus Operator updates the file configmaps.json(it lists the ConfigMaps and their checksums).
By regular means of Kubernetes itself, the data from the secret comes into the sub (the file is configmaps.jsonupdated).
Changing the file notices prometheus-config-reloaderwhich downloads the changed ConfigMaps to the directory rules(this emptyDir).
The same prometheus-config-reloadersends an HTTP request to Prometheus to reboot.
Prometheus re-reads the config and sees the changed rules.

That's all!

For more details on how we use Prometheus (and not only) for monitoring in Kubernetes, I plan to tell at the RootConf 2018 conference what will be held on May 28 and 29 in Moscow - come listen and chat.