Collectd - Keep track of the system at the lowest cost. Set up and use notifications

What it is?


Collectd is a small daemon that collects
statistics about the use of system resources every 10 seconds . It is possible to collect
statistics for several hosts and send it to a server that is
engaged in rendering beautiful graphs.

The main difference between this collector is that it works on the principle of push , not poll / pull . Those. he “hangs” and listens, and the servers themselves send statistics to him.

What will we do?


What I want to describe in this post:
  • Installation
  • The setting is general.
  • Customization of individual plugins.
  • Setting up slave servers, which will send us their statistics on the main server.
  • Set up email notifications.


Installation


We put it as usual through your favorite batch installer emerge / yum / apt-get or cho-there-still-exists .
For debian . There is no collectd in standard ports, for this we need to connect backports .
This is done quite simply:
Add a line
deb http://backports.debian.org/debian-backports squeeze-backports main

in your sources.list ( or create a new file with this line in /etc/apt/sources.list.d/ )
Then run apt-get update

Next, to install the package from backports we write the command
apt-get -t squeeze-backports install "package"

well, or through aptitude
aptitude -t squeeze-backports install "package"

In our case, it will look like
apt-get -t squeeze-backports install "collectd"

There is a small nuance in the gent. Firstly, it is disguised as ~ x86, and secondly, only a few plugins are installed by default. To indicate which plugins to install, you need to specify them either in package.use (of type collectd_plugin_memory ) or in make.conf in the variable COLLECTD_PLUGINS = "";
I have these installed:
COLLECTD_PLUGINS="apache cpu df disk interface load memory network ntpd processes notify_email ping logfile syslog rrdtool swap hddtemp exec filecount java sensors target_notification target_set target_replace"

Be careful, depending on the plugins it can pull a lot of things;), so choose what you need.

The established versions : on gentoo - 5.1.1 , on debian , after some dances with a tambourine - 4.1.1 (but it will have to be manually updated to 5.x, why - read below) , on centos6 - 5.1.0 .

 JFYI  Why you need to upgrade:because the data output in rrd is different in these versions, therefore, either write a crutch for conversion or write 2 scripts to generate graphs on the front-end face. And because of the change in schedules, you will have to take into account the client version on the host and write rules for notifications separately for it.

In Debian and Centos all plugins were installed. Well, because from the finished package it’s put :)

Customization


We are going further. I didn’t like the config format at all, it took a long time to look for where, so for myself I cut it into the parts I need, since other configs can be connected from the config, as they say, inline :)
Again, the whole config in the gent is in one file, which is located in /etc/collectd.conf . In the debian, it is placed in the beautiful path /etc/collectd/collectd.conf , as well as individual parts of the configuration such as filters and thresholds are placed in separate files, which is good news. In general, I made about the same configuration on my gent, changing it a little. In particular, the connection of the plugins I need is placed in a separate directory and each plug-in (more precisely, its configuration) is also in a separate file. Here's how he began to look:

# Config file for collectd(1).
#
# Some plugins need additional configuration and are disabled by default.
# Please read collectd.conf(5) for details.
#
# You should also read /usr/share/doc/collectd-core/README.Debian.plugins
# before enabling any more plugins.
Hostname "gen-collectd-master.local"
FQDNLookup true
BaseDir "/data/collectd"
#PluginDir "/usr/lib/collectd"
#TypesDB "/usr/share/collectd/types.db" "/etc/collectd/my_types.db"
#Interval 10
#Timeout 2
#ReadThreads 5
LoadPlugin logfile
LoadPlugin syslog

        LogLevel "info"
        File "/data/collectd/collectd.log"
        Timestamp true
        PrintSeverity true

        LogLevel info

LoadPlugin network

    Listen "192.168.56.130" "8085"

Include "/etc/collectd/inst/*.active"
Include "/etc/collectd/conf/*.conf"
Include "/etc/collectd/filters.conf"
Include "/etc/collectd/thresholds.conf"

This is the main configuration file, if you compare it to the default file, you will notice that my file does not have all the plugins, only those that I consider to be the main configuration. The remaining files are included from the inst and conf directories .
 JFYI  Also pay attention to the parameter FQDNLookup true - if you have something written in the hostname , then it should resolve! Otherwise, it will crash with an error, another solution is to set this parameter to false.

The inst directory contains the plugin configuration files:
gen-collectd-master collectd # ls -la /etc/collectd/inst/
total 32
drwxr-xr-x 2 root root 4096 Nov 26 20:57 .
drwxr-xr-x 4 root root 4096 Nov 26 21:00 ..
-rw-r--r-- 1 root root   15 Nov 26 13:54 cpu.active
-rw-r--r-- 1 root root  125 Nov 26 13:54 if.active
-rw-r--r-- 1 root root   16 Nov 26 13:54 load.active
-rw-r--r-- 1 root root   18 Nov 26 13:54 memory.active
-rw-r--r-- 1 root root  122 Nov 26 18:25 mounts.active
-rw-r--r-- 1 root root  133 Nov 26 20:57 ping-hosts.active

as you can see from the config, I only connect files with the "extension" active

 JFYI  All plug-in parameters can be found on the collectd.conf documentation page . Next, the conf directory contains 2 files, one for configuring the notify_email plugin , and the second for rrdtool settings 


gen-collectd-master collectd # ls -la /etc/collectd/conf/
total 16
drwxr-xr-x 2 root root 4096 Nov 26 20:30 .
drwxr-xr-x 4 root root 4096 Nov 26 21:00 ..
-rw-r--r-- 1 root root  425 Nov 26 20:30 mail.conf
-rw-r--r-- 1 root root   83 Nov 26 13:54 rrdtool.conf

In general, they can be returned quietly to collectd.conf , but for some reason at that time I wanted to do just that :) The

contents of the conf / rrdtool.conf file
LoadPlugin rrdtool

        DataDir "/data/collectd/rrd"

As you can see here, I download the plugin and set parameters for it.

The contents of the conf / mail.conf file
LoadPlugin notify_email

        SMTPServer "stmp.mail.ru"
        SMTPPort 25
        SMTPUser "collectd@mail.ru"
        SMTPPassword "my-super-password-for-mail"
        From "collectd@mail.ru"
#       #  on .
#       # Beware! Do not use not more than two placeholders (%)!
        Subject "[collectd] %s on %s!"
        Recipient "recipient@mail.ru"
 


We will need this plugin when we will configure notifications.

 JFYI  can write your own notification handler. To do this, you need to connect the exec plugin and prescribe a script that will be run when the notification is generated. It is done like this:

LoadPlugin exec

    NotificationExec    thunder "/home/thunder/ttest.sh" "test1"

The general specification of this command looks like this:
NotificationExec <пользователь> "<команда-для-запуска>" ["параметр1"] ["параметр2"] и т.д.


The following is written in the script
#!/bin/bash
cat >> /home/thunder/ttest.log

In the log during notification, something like

Severity: WARNING
Time: 1354181979.770
Host: jen-master-local
Plugin: cpu
PluginInstance: 0
Type: cpu
TypeInstance: user
DataSource: value
CurrentValue: 9.989738e+01
WarningMin: nan
WarningMax: 8.500000e+01
FailureMin: nan
FailureMax: nan
Host jen-master-local, plugin cpu (instance 0) type cpu (instance user): Data source "value" is currently 99.897375. That is above the warning threshold of 85.000000.

As we see all the data we have here, it will not be difficult to parse and writing your own notifier is also not difficult.

Let's go back to the main collectd.conf file I won’t explain
about syslog / logfile , so everything is clear, hostname too.
The network plugin - more specifically about the plugin, you can read here , in particular, you can set authorization there. At home, I won’t consider it yet, how each one will decide for himself :)
This plugin serves for interaction between collectd servers .
To configure the current server as a server for collecting statistics, you need to set the parameterListen "192.168.56.130" "8085", where 192.168.56.130 is the ip address on which the daemon will hang and listen to incoming data from other servers. 8085 - the port on which it will hang.
To configure the client, instead of Listen, specify Server "192.168.56.130" "8085" , respectively 192.168.56.130 - the IP address where to send data. 8085 - the port to send data to.

 JFYI  Port can be omitted, by default port 25826 will be used, just remember that it will work over UDP protocol, so keep in mind if you have a firewall somewhere .

The configuration of the plugins is no different there and there.

Everything that you have configured for monitoring on the " Client " will be sent to the " Server ".

Mail notifications


Let's move on to the tastiest. The only examples of configuring the notification of some plugins are only in the thresholds.conf config .
The main loading of the plugin and example:
LoadPlugin "threshold"
 
     WarningMin    0.00
     WarningMax 1000.00
     FailureMin    0.00
     FailureMax 1200.00
     Invert false
     Instance "bar"
   

A brief explanation of how this works. Threshold is a regular plugin, which is why it loads as a plugin. All parameters are set inside the container.. Inside it, containers can be set in the following sequence - " Host ", " Plugin ", " Type ". Those. inside the Host container, there may be a Plugin container , inside of which there may be a Type container . The Host block is optional, with it you can bind notifications for a specific host. Also, all values ​​should be set only inside the Type block , the only value that can be set outside the Type block is Instance.
If several blocks are applied to the same value, the most accurate of the blocks will be used. T.O. you can specify some kind of standard block for the plugin, and then, for example, override it with other parameters for a specific host. So, let's proceed directly to the configuration of notifications.

plugin cpu



     Instance "user"
     WarningMax 85
     Hits 1

Here you can skip writing the Plugin block before the Type block . We indicate that you need to monitor the value of user (user processes) and if it reaches a value of 85 , then send a warning. Hits - the number of hits in this value in one Interval (see setting the main config), in our case it is 1 , i.e. if within 10 seconds the value is > = 85 , a notification will be generated. Here you can set the value more, for example, by 6 , i.e. if within one minute the value is this, then there is something to worry about.

Ping plugin



    
        FailureMax 0.9
    

As you can see here, we set the ping type to ping_droprate. This table contains a value of either 0 or 1 . Accordingly, we specify generation of the Failure type if the value exceeds 0.9 . If you set 1 , it will not work :)

Memory plugin



    
        Instance "free"
        WarningMin 25000000
    

We select instance free , since we monitor the free memory, here the lower the value of free , the worse, we set WarningMin . If the value reaches or becomes less than the specified value, a notification will be generated.

Now the most interesting thing is that this is not in the documentation and it turned out to be hard to find an example, so I had to experiment.
We make notifications on a place on a disk

Df  plugin



    Instance "root"
    
#       DataSource "value"
        WarningMax 4025360000
        FailureMax 6025360000
        Percentage false
    


So, in version 5.x, the logic of creating tables for the df plugin has changed, so accessing the tables has become a friend.
Instance - indicate the chart for which section to contact
Type - df_complex-used - df_complex is always and necessary, after the dash, in our case, the data is searched for by the used place.
Now the DataSource can be omitted, since the table has only one value field .
WarningMax / FailureMax- unfortunately, for some unknown reason, for this plugin it is impossible to use percentage data, therefore for each host it is necessary to hammer this plugin with specific values. Also below we clearly indicate that do not use percentages. The question about this appeared back in 2011 and in version 4.9.1, but there is still no answer to it.

That's all, basically, the main plugins are configured, notifications on them too.

Suggestions, suggestions, questions are welcome. I will answer as far as possible.


Also popular now: