Collection and visualization of application metrics in Graphite and Graph-Explorer

  • Tutorial
Often there is a need to monitor various parameters of the application / service. For example, the interest is the number of requests per second, the average server response time, the number of server responses with different HTTP status (technical metrics), the number of user registrations per hour, the number of payment transactions per minute (business metrics), etc. Without a metrics collection system product development and maintenance is almost blind.



This article is a guide to setting up a system for collecting and analyzing application metrics based on Graphite and vimeo / graph-explorer .

Motivation


The metrics collection system is not a monolith. When you deploy it, you have to deal with a significant number of components, each of which somehow interacts with the rest, has its own configuration file and a unique way to start. Even Graphite, in itself, consists of at least three subsystems - a metric collection daemon (carbon), a database with metrics (whisper, etc.) and a web application for visualization. When it is necessary to add graph-explorer support, everything becomes even more interesting. Each of the subsystems has its own separate documentation, but nowhere is there a document describing the whole picture.

Metrics


A metric is a sequence of (numerical) values ​​over time. A very simple thing in essence. In fact, there is some string key and its corresponding series (sample 1 , time 1 ), (sample 2 , time 2 ), ... A typical way for Graphite to name metrics is to divide the string keys into parts using the " . " Symbol , for example, stats.web .request.GET.time . Graphite allows you to group metrics with a common prefix using the " *"when plotting graphs. Obviously, this is far from the most flexible way to work with keys. If you need to add another component to the key, this can break the graphing. For example, casting the key from the example above to the form stats.web. server1 . request.GET.time violate common prefix for historical data. The second major drawback of the naming of metrics is the potential ambiguity of the interpretation of them. Much more self-sufficient would be the metrics that have keys type service = web server = server1 what = request_time unit = ms with further opportunities Tew constructing combined plots general tags, not just common prefixes. Luckily, the guys from vimeo come up with metrics 2.0and washed down their graph-explorer , working on top of graphite. The main idea is a logical representation of metrics as entities with a set of tag-value pairs. Each metric in 2.0 format will ultimately be converted to a regular string key, separated by dots and ended up in carbon, but first an “index” is created in a separate storage that stores information about the correspondence of these keys and tag-value pairs. Thus, using the information from the indices, graph-explorer implements a combination of various metrics on one chart.

General view


In general, the metrics collection system can be represented by the following diagram:

Thus, an application (web service, daemon, etc.), written not in any language, sends metrics to the collector through some interface (layer), the collector partially aggregates them, calculates the update frequency (optional) and sends them once in a while in carbon, which gradually puts them in storage. A web application pulls data from storage (and partially from carbon itself) and builds graphs for us. The carbon daemon is actually 3 whole demons: carbon-cache, carbon-relay and carbon-aggregator. In the simplest case, carbon-cache can be used. The carbon-relay implementation can be used for sharding (load balancing between several carbon-caches) or replication (sending the same metrics to several carbon-caches). The carbon-aggregator daemon can perform intermediate processing of metrics before sending them to the repository. Metric data in carbon can be transferred in one of two formats: in plain text (the so-called line protocol) to port 2003 and serialized in pickle to port 2004. At the same time, carbon-relay only outputs data to pickle (important! )

The graph-explorer add-on adds another store for the so-called index metrics. As such storage elastic search is used . Obviously, in some place of the system presented on the diagram, it is necessary to add a link that will “index” the metrics. This link is carbon-tagger . As a result, the system takes the following form:


Technology stack


Next is the specifics, in your case, perhaps some of the components will be replaced with another solution.

The system is intended for collecting precisely metrics 2.0 with their subsequent use in graph-explorer.

Installation


Installation will take place in the / opt / graphite directory, which is the default directory. Some of the components are written in Go, so it will also have to be pre-installed and set up appropriate environment variables. Installation implies that Go is the only one on the system. If you have several versions of Go installed, then you can skip this step and configure the desired version as you wish.

cd /opt
wget https://storage.googleapis.com/golang/go1.4.2.linux-amd64.tar.gz
tar -C /usr/local -xzf go1.4.2.linux-amd64.tar.gz
echo 'export PATH=$PATH:/usr/local/go/bin' >> /etc/profile
echo 'export GOPATH=/opt/go' >> /etc/profile
echo 'export GOBIN="$GOPATH/bin"' >> /etc/profile
echo 'export PATH=$PATH:$GOBIN' >> /etc/profile
source /etc/profile
go env 
# Вывод должен содержать следующие строчки
#GOBIN="/opt/go/bin"
#GOPATH="/opt/go"
#GOROOT="/usr/local/go"

Also, another important point when deploying the system is to carefully monitor the ports. There are many components, each uses several ports and it is easy to make a mistake who to associate with whom. Plus, most components do not have any built-in authorization mechanism, and at the same time, by default they listen to the 0.0.0.0 interface. Therefore, I highly recommend wherever it is possible to change the interface to the local one and close all ports on the server via iptables.

statsd python client


As the client involved in sending metrics from the application, the most popular statsd was selected . Its implementation is infinitely simple. In fact, this is sending text data to the specified UDP / TCP port with the addition of minimal protocol overhead.

# внутри виртуального окружения приложения
pip install statsd

An example of use in the application code:

import statsd
client = statsd.StatsClient(host='statsdaemon.local', port=8125)
# отправка метрики в metrics 2.0 формате tag_is_value
client.gauge('service_is_myapp.server_is_web1.what_is_http_request.unit_is_ms', )

statsd


In the "classic" graphite installation scheme, statsd is often used as an intermediate collector. In our case, statsdaemon is used , since it can work with metrics 2.0 out of the box, while maintaining backward compatibility with the statsd protocol. It is written in Go and its installation is extremely simple (carefully, now in README.md there is an annoying error in the installation command):

go get github.com/Vimeo/statsdaemon/statsdaemon

After that, the statsdaemon executable should appear in the / opt / go / bin directory. The settings for this daemon are quite simple:
statsdaemon.ini
# --- /etc/statsdaemon.ini ---
listen_addr = ":8125"  # сюда будет слать метрики statsd-клиент (приложение)
admin_addr = ":8126"
graphite_addr = "carbon.local:2013" # адрес carbon для агрегированных метрик, раз в flush_interval сек
flush_interval = 30
prefix_rates = "stats."
prefix_timers = "stats.timers."
prefix_gauges = "stats.gauges."
percentile_thresholds = "90,75"
max_timers_per_s = 1000


Running statsdaemon:

statsdaemon -config_file="/etc/statsdaemon.ini" -debug=true

At this stage, you can already run statsdaemon and send several packages to it from the application using the statsd client. The output to the console will speak for itself.

Graphite


The current installation guide is here . Installation is best done inside a virtual environment located in / opt / graphite.

sudo apt-get install python-pip python-dev
pip install pip --upgrade
pip install virtualenv
mkdir /opt/graphite
virtualenv /opt/graphite
cd /opt/graphite
source bin/activate
sudo apt-get install libcairo2 python-cairo libffi-dev # установка нужных для graphite пакетов
pip install https://github.com/graphite-project/ceres/tarball/master
pip install whisper
pip install carbon # pip install carbon --install-option="--prefix=/opt/graphite" --install-option="--install-lib=/opt/graphite/lib"
pip install graphite-web # pip install graphite-web --install-option="--prefix=/opt/graphite" --install-option="--install-lib=/opt/graphite/webapp"
# нужно для Graphite WebApp
pip install uwsgi 
pip install django
pip install cairocffi
pip install django-tagging
# инициализация webapp
(cd /opt/graphite/webapp/graphite; python manage.py syncdb)

After installation, graphite will be located in / opt / graphite. Next, you need to configure it. An example configuration file is located in / opt / graphite / conf. The minimum that needs to be done is to create a carbon and whisper settings file.

cp /opt/graphite/conf/carbon.conf.example /opt/graphite/conf/carbon.conf
# carbon.conf содержит настройки для carbon-cache, carbon-relay и carbon-aggregator.
# Необходимо настроить как минимум следующие значения в секции carbon-cache:
# LINE_RECEIVER_INTERFACE = 127.0.0.1
# LINE_RECEIVER_PORT = 2003
cp /opt/graphite/conf/storage-schemas.conf.example /opt/graphite/storage-schemas.conf
# storage-schemas.conf содержит настройки whisper, который по сути - fixed-size db.
# Аллокация места под метрики происходит 1 раз, поэтому нужно явно задать (по
# ключу метрики), с какой частотой дескритизации и за какой период хранить данные.
...

Next, you need to run carbon-cache:

carbon-cache.py --conf=conf/carbon.conf start # --debug
tail -f /opt/graphite/storage/log/carbon-cache/carbon-cache-a/*.log

And graphite webapp using uwsgi + some web server (like nginx):

cp /opt/graphite/webapp/graphite/local_settings.py.example /opt/graphite/webapp/graphite/local_settings.py
# в local_settings.py необходимо изменить SECRET_KEY и TIME_ZONE.
/opt/graphite/bin/uwsgi --socket localhost:6001 --master --processes 4 --home /opt/graphite --pythonpath /opt/graphite/webapp/graphite --wsgi-file=/opt/graphite/conf/graphite.wsgi.example --daemonize=/var/log/graphite-uwsgi.log

Nginx settings:
graphite.conf
upstream graphite_upstream {
    server 127.0.0.1:6001;
}
server {
    listen 8085;
    server_name graphite.local;
    location / {
        include            uwsgi_params;
        uwsgi_pass         graphite_upstream;
        add_header 'Access-Control-Allow-Origin' '*';
        add_header 'Access-Control-Allow-Methods' 'GET, POST, OPTIONS';
        add_header 'Access-Control-Allow-Headers' 'origin, authorization, accept';
        add_header 'Access-Control-Allow-Credentials' 'true';
        proxy_redirect     off;
        proxy_set_header   Host $host;
        proxy_set_header   X-Real-IP $remote_addr;
        proxy_set_header   X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header   X-Forwarded-Host $server_name;
    }
}


It remains to install only carbon-tagger (it is he who fills the index base for graph-explorer) and configure the duplicate sending of metrics to carbon-cache and carbon-tagger using carbon-relay. But unfortunately, carbon-tagger does not know how to use pickle protocol, and carbon-relay sends data only in this format. Therefore, you must set the drop-in replacement of carbon-relay from vimeo - carbon-relay-ng :

go get -d github.com/graphite-ng/carbon-relay-ng
go get github.com/jteeuwen/go-bindata/...
cd "/opt/go/src/github.com/graphite-ng/carbon-relay-ng"
make
cp carbon-relay-ng /opt/go/bin/carbon-relay-ng
touch /opt/graphite/conf/carbon-relay-ng.ini
cd /opt/graphite
carbon-relay-ng conf/carbon-relay-ng.ini

carbon-relay-ng.ini
instance = "default"
listen_addr = "127.0.0.1:2013"
admin_addr = "127.0.0.1:2014"
http_addr = "127.0.0.1:8081"
spool_dir = "spool"
log_level = "notice"
bad_metrics_max_age = "24h"
init = [
     'addRoute sendAllMatch carbon-default  127.0.0.1:2003 spool=true pickle=false', # отправляем все в carbon-cache
     'addRoute sendAllMatch carbon-tagger  127.0.0.1:2023 spool=true pickle=false'  # отправляем все в carbon-tagger
]
[instrumentation]
graphite_addr = ""
graphite_interval = 1000 


carbon tagger


The carbon-tagger daemon is written in Go and sends metric indices to Elastic Search for later use in graph-explorer. First of all, java and Elastic Search must be installed on the server . Install carbon-tagger:

go get github.com/Vimeo/carbon-tagger
go get github.com/mjibson/party
go build github.com/Vimeo/carbon-tagger

carbon-tagger.conf
[in]
port = 2023 # сюда присылает метрики carbon-relay-ng
[elasticsearch]
host = "esearch.local"
port = 9200
index = "graphite_metrics2"
flush_interval = 2
max_backlog = 10000
max_pending = 5000
[stats]
host = "localhost"
port = 2003 # сюда carbon-tagger будет отправлять собственные внутренние метрики (не метрики приложения)
id = "default"
flush_interval = 10
http_addr = "127.0.0.1:8123"


Launch carbon-tagger:

(cd /opt/go/src/github.com/Vimeo/carbon-tagger/; ./recreate_index.sh) # инициализация индексов в ES
carbon-tagger -config="/opt/graphite/conf/carbon-tagger.conf" -verbose=true

graph-explorer


And finally, installing a nail program:

pip install graph-explorer

graph-explorer.conf
[graph_explorer]
listen_host = 127.0.0.1 # локальный адрес, чтобы добавить HTTP Basic Auth через nginx
listen_port = 8080
filename_metrics = metrics.json
log_file = /var/log/graph-explorer/graph-explorer.log
[graphite]
url_server = http://localhost
url_client = http://graphite.local:8085 # адрес graphite webapp


nginx / graph-explorer.conf
server {
    listen 80;
    server_name metrics.yourproject.net;
    location / {
        auth_basic           "Who are you?";
        auth_basic_user_file /etc/nginx/.htpasswd;
        proxy_pass http://localhost:8080;
    }
}


Running graph-explorer:

mkdir /var/log/graph-explorer
run_graph_explorer.py /opt/graphite/conf/graph_explorer.conf

After that, the graph-explorer web interface will be available at metrics.yourproject.net .

Instead of a conclusion


Stop living working out with your eyes closed,% habrauser%! Deploy metric collection systems and share entertaining schedules from your projects! Thanks for attention!

Also popular now: