
Device and mechanism of work of Prometheus Operator in Kubernetes
This article is based on our internal documentation for DevOps engineers explaining how Prometheus works under the control of the Prometheus Operator in deployed and maintained Kubernetes clusters.

At first glance, Prometheus may seem like a rather complicated product, but, like any well-designed system, it consists of clearly expressed functional components and essentially does only three things: a) collects metrics, b) fulfills the rules, c) stores the result in the database time series data. The article is devoted not so much to Prometheus itself, but how to integrate this system with Kubernetes, for which we are actively using an auxiliary tool called Prometheus Operator . But you still need to start with Prometheus itself ...
So, if we dwell in more detail on the first two functions of Prometheus, then they work as follows:
The Prometheus server has config and rule files .
The following sections are available in config :
The general algorithm of Prometheus is as follows:

A
Thus, Prometheus itself tracks:
Changing the config is required in the following cases:
Having dealt with the basics of Prometheus, let's move on to its “operator" - a special auxiliary component for Kubernetes that simplifies the deployment and operation of Prometheus in the realities of the cluster.
For the notorious "simplification", firstly, in the Prometheus Operator using the CRD ( Custom Resource Definitions ) mechanism, three resources are defined:
Secondly, the operator monitors the resources
Finally, the operator also monitors the resources
The sub consists of two containers:

Sub uses three volumes :


For more details on how we use Prometheus (and not only) for monitoring in Kubernetes, I plan to tell at the RootConf 2018 conference what will be held on May 28 and 29 in Moscow - come listen and chat.
Read also in our blog:

At first glance, Prometheus may seem like a rather complicated product, but, like any well-designed system, it consists of clearly expressed functional components and essentially does only three things: a) collects metrics, b) fulfills the rules, c) stores the result in the database time series data. The article is devoted not so much to Prometheus itself, but how to integrate this system with Kubernetes, for which we are actively using an auxiliary tool called Prometheus Operator . But you still need to start with Prometheus itself ...
Prometheus: what is he doing?
So, if we dwell in more detail on the first two functions of Prometheus, then they work as follows:
- For each monitoring target (target) , each
scrape_interval
, an HTTP request is made to this target. In response, metrics are received in their own format , which are stored in the database. - Each
evaluation_interval
processed rules (rules) , based on which:- or alerts are sent,
- or new metrics are written (to oneself in the database) (the result of the rule).
Prometheus: how is it configured?
The Prometheus server has config and rule files .
The following sections are available in config :
scrape_configs
- settings for searching for targets for monitoring (see the next section for details);rule_files
- a list of directories where the rules that need to be downloaded are:rule_files: - /etc/prometheus/rules/rules-0/* - /etc/prometheus/rules/rules-1/*
alerting
- Alertmanager 's search settings to which alerts are sent. The section is very similar toscrape_configs
with the difference that the result of its work is a list of endpoints to which Prometheus will send alerts.
Prometheus: where does the goal list come from?
The general algorithm of Prometheus is as follows:

- Prometheus reads the config section
scrape_configs
, according to which adjusts its internal service discovery mechanism (Service Discovery) . - The Service Discovery mechanism interacts with the Kubernetes API (mainly for receiving endpoints ).
- Based on data from Kubernetes, the Service Discovery mechanism updates Targets (a list of goals).
A
scrape_configs
list of scrape jobs is indicated (this is the internal concept of Prometheus), each of which is defined as follows:scrape_configs:
# Общие настройки
- job_name: kube-prometheus/custom/0 # просто название scrape job'а
# показывается в разделе Service Discovery
scrape_interval: 30s # как часто собирать данные
scrape_timeout: 10s # таймаут на запрос
metrics_path: /metrics # path, который запрашивать
scheme: http # http или https
# Настройки Service Discovery
kubernetes_sd_configs: # означает, что targets мы получаем из Kubernetes
- api_server: null # использовать адрес API-сервера из переменных
# окружения (которые есть в каждом поде)
role: endpoints # targets брать из endpoints
namespaces:
names: # искать endpoints только в этих namespaces
- foo
- baz
# Настройки "фильтрации" (какие enpoints брать, какие — нет) и "релейблинга"
# (какие лейблы добавить или удалить — для всех получаемых метрик)
relabel_configs:
# Фильтр по значению лейбла prometheus_custom_target,
# полученного из service, связанного с endpoint
- source_labels: [__meta_kubernetes_service_label_prometheus_custom_target]
regex: .+ # подходит любой НЕ пустой лейбл
action: keep
# Фильтр по имени порта
- source_labels: [__meta_kubernetes_endpoint_port_name]
regex: http-metrics # подходит, если порт называется http-metrics
action: keep
# Добавляем лейбл job, используем значение лейбла prometheus_custom_target
# у service, к которому добавляем префикс "custom-"
#
# Лейбл job — служебный в Prometheus. Он определяет название группы,
# в которой будет показываться target на странице targets, а также он будет
# у каждой метрики, полученной у этих targets (чтобы можно было удобно
# фильтровать в rules и dashboards)
- source_labels: [__meta_kubernetes_service_label_prometheus_custom_target]
regex: (.*)
target_label: job
replacement: custom-$1
action: replace
# Добавляем лейбл namespace
- source_labels: [__meta_kubernetes_namespace]
regex: (.*)
target_label: namespace
replacement: $1
action: replace
# Добавляем лейбл service
- source_labels: [__meta_kubernetes_service_name]
regex: (.*)
target_label: service
replacement: $1
action: replace
# Добавляем лейбл instance (в нём будет имя пода)
- source_labels: [__meta_kubernetes_pod_name]
regex: (.*)
target_label: instance
replacement: $1
action: replace
Thus, Prometheus itself tracks:
- adding and removing hearths (when adding / removing hearths, Kubernetes changes endpoints, and Prometheus sees this and adds / removes goals);
- adding and removing services (more precisely, endpoints) in the specified namespaces .
Changing the config is required in the following cases:
- you need to add a new scrape config (usually this is a new kind of service that needs to be monitored);
- need to change the namespace list.
Having dealt with the basics of Prometheus, let's move on to its “operator" - a special auxiliary component for Kubernetes that simplifies the deployment and operation of Prometheus in the realities of the cluster.
Prometheus Operator: what is it doing?
For the notorious "simplification", firstly, in the Prometheus Operator using the CRD ( Custom Resource Definitions ) mechanism, three resources are defined:
prometheus
- defines the installation (cluster) of Prometheus;servicemonitor
- determines how to monitor a set of services (i.e. collect their metrics);alertmanager
- defines a cluster of Alertmanagers (we don’t use them, because we send metrics directly to our notification system, which receives, aggregates and ranks data from a variety of sources - including integration with Slack and Telegram).
Secondly, the operator monitors the resources
prometheus
and generates for each of them:- StatefulSet (with Prometheus itself);
- Secret with
prometheus.yaml
(config Prometheus) andconfigmaps.json
(config forprometheus-config-reloader
).
Finally, the operator also monitors the resources
servicemonitor
and ConfigMaps with the rules, and on their basis updates the configs prometheus.yaml
and configmaps.json
(they are kept secret).What's up with Prometheus?
The sub consists of two containers:
prometheus
- Prometheus itself;prometheus-config-reloader
- a binding that monitors changesprometheus.yaml
and, if necessary, causes reload of the Prometheus configuration (with a special HTTP request - see more details below), and also monitors ConfigMaps with rules (they are specified inconfigmaps.json
- see more details below) and, if necessary, downloads them and restarts Prometheus.

Sub uses three volumes :
config
- mounted secret (two files:prometheus.yaml
andconfigmaps.json
). Connected to both containers;rules
-emptyDir
which fillsprometheus-config-reloader
and readsprometheus
. Connected to both containers, but inprometheus
- in read-only mode;data
- Prometheus data. Mounted only inprometheus
.
How are Service Monitors handled?

- The Prometheus Operator reads Service Monitors (and also monitors their addition / deletion / change). Which Service Monitors are specified in the resource itself
prometheus
(see the documentation for more details ). - For each Service Monitor , if it does not specify a specific list of namespaces (i.e. indicated
any: true
), the Prometheus Operator calculates (using the Kubernetes API) a list of namespaces in which there are Services that match the labels specified in Service Monitor . - Based on the read resources
servicemonitor
(see the documentation ) and on the basis of the calculated name spaces, the Prometheus Operator generates a part of thescrape_configs
config (section ) and saves the config in the corresponding secret. - By regular means of Kubernetes itself, the data from the secret comes into the sub (the file is
prometheus.yaml
updated). - Changing the file notices
prometheus-config-reloader
that HTTP sends a request to Prometheus to reboot. - Prometheus re-reads the config and sees the changes in
scrape_configs
, which it processes according to its logic of operation (see more details above).
How are ConfigMaps handled with rules?

- Prometheus Operator monitors ConfigMaps matching the
ruleSelector
specified in the resourceprometheus
. - If a new (or existing) ConfigMap has appeared , the Prometheus Operator updates
prometheus.yaml
, and then the logic is triggered, which corresponds exactly to the Service Monitors processing (see above). - As in the case of adding / removing ConfigMap , and when changing the contents of ConfigMap , Prometheus Operator updates the file
configmaps.json
(it lists the ConfigMaps and their checksums). - By regular means of Kubernetes itself, the data from the secret comes into the sub (the file is
configmaps.json
updated). - Changing the file notices
prometheus-config-reloader
which downloads the changed ConfigMaps to the directoryrules
(thisemptyDir
). - The same
prometheus-config-reloader
sends an HTTP request to Prometheus to reboot. - Prometheus re-reads the config and sees the changed rules.
That's all!
For more details on how we use Prometheus (and not only) for monitoring in Kubernetes, I plan to tell at the RootConf 2018 conference what will be held on May 28 and 29 in Moscow - come listen and chat.
PS
Read also in our blog:
- “ Monitoring and Kubernetes (review and video report) ”;
- " Operators for Kubernetes: how to run stateful applications ";
- “ Monitoring with Prometheus in Kubernetes in 15 minutes ”;
- “ Kubernetes success stories in production. Part 4: SoundCloud (authors Prometheus) ";
- “ Introducing loghouse - an open source system for working with logs in Kubernetes ”;
- “ Our experience with Kubernetes in small projects ” (video report, which includes an introduction to the technical device of Kubernetes) ;
- “ Infrastructure with Kubernetes as an affordable service .”