Ganglia and Nagios. Complementary Remote Monitoring

All system administrators sooner or later face the problem of monitoring production servers. To solve this problem, there is a whole zoo of various ways. A very popular system is Nagios because of its powerful alert mechanism. Also, systems are often used that are more concentrated on collecting the values ​​of various parameters, and tracking these changes over time to collect statistics, such as: Cacti , Zabbix , Ganglia . And Ganglia is undeservedly deprived of the attention of the Habrasociety. In this topic, I will try to fix this drawback, and show how flexible and useful this tool is.


So Ganglia is an open source monitoring system designed to work with thousands of nodes, originally developed at Berkeley University. Ganglia is easy to install and use. Its distinctive feature is high flexibility and scalability. Since configuring and installing ganglia is beyond the scope of this article, you can read about it here . I’ll also add that, unlike cacti, ganglia continues to collect data about the system, even if it disconnected from the network. So when the server appears again on the network, it will transfer all the accumulated data and there will be no gaps in the graphs of metrics.
About installing and configuring Nagios, as well as its integration with Ganglia, you can read here .
Using these materials, it is already possible to configure Ganglia and teach Nagios how to monitor it, but in real life we ​​are faced with more complex situations such as: you need to monitor the server on the internal LAN, transfer metrics over a secure channel, and much more. To solve such things, there is nrpe (more details can be found here ).
Actually from here the essence of this article begins. Situation: a remote server on the local network, Ganglia is installed, a server on the production network with Nagios installed. Task: monitor the remote system.
First, let's install everything you need on a remote server.
First of all, install the plugin for ganglia check_ganglia_metric . We follow the instructions, check the performance of the plugin.
Then install nagios-nrpe-server:
sudo aptitude install nagios-nrpe-server

Next, go to the config:
sudo nano /etc/nrpe.cfg

we correct lines:
доверенные хосты:
allowed_hosts = 
разрешение передавать аргументы в плагины:
dont_blame_nrpe = 1
в конце конфига список выполняемых скриптов:
command[some_name] = path args
прописываем сюда путь к check_ganglia_metric
ommand[check_ganglia_metric] = check_ganglia_metric.py --gmetad_host=your_host --metric_host=metric_host_you_neded --metric_name=$ARG1$ --warning=$ARG2$ --critical=$ARG3$

Save and restart the plugin:
sudo /etc/init.d/nagios-nrpe-server restart

Now we go to our server to configure Nagios (you already configured it using the links above):
Add to the services:
define service{
                    use    generic-service
                    host_name   your_remote_host
                    service_description   remote_ganglia_checking
                    check_command   check_nrpe!check_ganglia_metric!$ARG1$  $ARG2$ $ARG3$
}

We restart Nagios, and we see that he set the Warning state for our metric, and also says that he can not recognize the answer. Well - get the file. ;)
We need a script that runs check_ganglia_metric.py.
How to write plugins for Nagios is here: We write our plugin for nagios Joka
Here is the actual code itself, written in Python:
Plugin source code
!/usr/bin/python2.6
# -*- coding: utf-8 -*-
import sys
import subprocess
import shlex
if len(sys.argv) < 5:
    print("wrong config data")
    sys.exit(3)
argGmetadHost = str(sys.argv[1])
argMetricHost = str(sys.argv[2])
argMetricName = str(sys.argv[3])
argWarning = str(sys.argv[4])
argCritical = str(sys.argv[5])
command_line = "".join(['sudo check_ganglia_metric.py --gmetad_host=', argGmetadHost, ' --metric_host=', argMetricHost, ' --metric_name=', argMetricName, ' --warning=', argWarning, ' --critical=', argCritical])
args = shlex.split(command_line)
p = subprocess.call(args)


The check_ganglia_metric.py script creates a check_ganglia_metric.cache file at run time. When launched as a Nagios user, it tries to create this file in a directory owned by root.
There is a problem that to run check_ganglia_metric.py by the Nagios user, he needs root privileges - this is completely bad. But we can give him the opportunity to run only this script:
sudo nano /etc/sudousers 

 nagios ALL=(ALL) NOPASSWD: /usr/local/bin/check_ganglia_metric.py

Again, edit the nrpe config:
Add our plugin to the executable ones :
сommand[check_ganglia] =/usr/lib/nagios/plugins/ganglia_support.py $ARG1$ $ARG2$ $ARG3$ $ARG4$ $ARG5$

Save, restart.
Again, correct the Nagios configuration:
define service{
                    use    generic-service
                    host_name   your_remote_host
                    service_description   remote_ganglia_checking
                    check_command   check_nrpe!check_ganglia!$ARG1$ $ARG2$ $ARG3$ $ARG4$ $ARG5$
}

Save, restarts. Everything now works.
All paths and commands were run under Debian.
I hope it will be useful and save time and coffee when setting up monitoring systems on battle servers.
Screens
Ganglia:

Nagios:


Also popular now: