Configuring automatic restart of Windows services in Nagios XI

I think that there is no need to talk a lot about such a famous product as Nagios , many people use it. For those who have not heard, I inform you: this is a monitoring system that can do many useful things besides this. Now I am actively studying this system and in this publication I will tell you how to simplify the work of the system administrator a little.

To begin with, I want to clarify: I mean that the organization already has a server with Nagios and the monitoring of any nodes and / or services is configured. So, let's say there is a server called PrintServer , whose spooler service often stops. It is quite commonplace, but this is just an example, on the basis of which you can come up with a lot of useful things. The first thing to do is set up monitoring for this service.

We go into the Nagios web interface under the administrator and select Configure at the top of the page . Next, to add a new monitoring, you need to run the Monitoring Wizard . Subsequently, for editing and adding new hosts, I recommend running immediately Core Config Manager (column on the left), but for the first acquaintance a wizard is better.



The wizard consists of several simple steps in which the necessary parameters are indicated.

1. Since this is a Windows service, you need to select Windows Server .
2. Enter the ip address.
3. Here it is already more interesting. In the Windows Agent section , you can download (via the link from the Nagios.com website) the latest version of Agent for Windows (it is installed on the client side) and enter the Agent password. The password is set on the server side, and configured on the client side for client-server interaction. The next section is Server Metrics . Here you can configure various metrics for verification. I think everything is clear here, and, because you need to configure the service, in this section you need to remove all the daws. Next is the Services section . This is where you need to register the necessary service for monitoring and put a daw. In the Windows Service column , you need to specify the Service Name , this is important.



Next are the Processes andPerformance Counters . In the future, they will most likely come in handy, but now you need to skip them and go to step 4.
4. Here you can set the frequency of checks in minutes and the time interval and the number of checks before Nagios generates an alert. These settings depend only on the specific task, so I will skip this section. Now you can click Finish and see the wizard report. If all three stages were successful, then this means that the service is created.



Next, select Core Config Manager from the left menu and select Services from the Monitoring section below . In this section, in search we drive in Print Spoolerand find the created config. Now Nagios monitors the status of the service. Next, you can proceed to configure the restart of the service by event. To do this, you must first configure the client side. We go to the server and install the Nagios agent. After installation, go to the folder C: \ Program Files \ NSClient ++. Perhaps the path will be different, the main thing is to get into the program installation folder. We search for NSC.ini or nsclient.ini and open for editing. Here you need to change a few options:

1. Uncomment entry CheckExternalScripts.dll at the beginning of the file by removing the ";";
2. Uncomment allow_arguments and put parameter 1 instead of 0;
3. Add an entry to the [External Scripts] sectionrestart_svc = scripts \ restart_svc.bat “$ ARG1 $” .

Save the changes and bury the file. Next, in this folder we find the Scripts folder and in it we create the svc_restart.bat batch file with the following contents:

echo off
Net stop% 1
Net start% 1
Exit 0

After that, you need to restart the NSClient ++ service so that it loads the changes to the .ini file. On this, the client setup is ready, it remains to configure the server.

The restart process looks like this:

1. The service stops, and in Nagios an event is generated that the status of the service is Critical;
2. The event handler launches the command configured for it;
3. The command runs the script located on the Nagios server;
4. The script passes the command with arguments to NSClient ++;
5. NSClient ++ starts the script located on the client .bat which destroys the earth, restarts the service.

In principle, everything is simple. Now let's go in order. Service monitoring is configured, so that an event when the service is stopped will be generated. Before moving on, create a command for the event handler. Go to Configure => Core Config Manager => Commands . We create a new team using Add New and drive in the necessary parameters.

Command Name = svc_restart
Command Line = $ USER1 $ / svc_restart.sh $ SERVICESTATE $ $ HOSTADDRESS $ $ _SERVICESERVICE $
Command Type = misc command
And put the checkbox ACTIVE.



The command is created, click Apply Configuration . Next, go to the Services section and open the created Print Spooler service . Here, in the Check Settings tab in the event handler (Event Handler), select the command (svc_restart), which will start when the event occurs and turn on the handler itself.



Next, go to the Misc Settings tab and then Manage Variable Definitions . Here we fill in:
Variable name = _SERVICE
Variable Value = spooler
And press Insert and Close .
This completes the configuration of the service, you can save and apply the changes (Save & Apply Configuration). Now Nagios, when the service is stopped, will run the svc_restart command , which in turn will run svc_restart.sh with the necessary parameters. The problem is that there is no such script, and it needs to be created. The team will look for a script in a local folder with scripts and binaries on the server, so the next step is to log into the console in the server and go to the / usr / local / Nagios / libexec folder and use any convenient text editor (I used nano) to create a file there svc_restart.sh with the following contents:

1. #! / bin / sh
2. # Event Handler for Restarting Windows Services
3. case "$ 1" in
4. OK)
5. ;;
6. WARNING)
7. ;;
8. UNKNOWN)
9. ;;
10. CRITICAL)
11.
12. / usr / local / nagios / libexec / check_nrpe -H "$ 2" -p 5666 –c svc_restart -a "$ 3"
13. ;;
14. esac
15.
16. exit 0

Now you need to change permissions on this file with the following console commands:

chown nagios: nagios /usr/local/nagios/libexec/servicerestart.sh
chmod 775 /usr/local/nagios/libexec/servicerestart.sh

This completes the setup. If everything is done according to the instructions, there should be no problems. In the end, I will once again describe the mechanism of work and a bit of troubleshooting:

1. The service stops, the client sends to the server that it has the status of the Critical service;
2. The server creates an event with arguments hostname, servicename, servicestate;
3. The event handler executes the svc_restart command with the specified parameters;
4. The command starts svc_restart.sh with the necessary parameters;
5. Svc_restart.sh launches check_nrpe with the necessary parameters;
6. Check_nrpe tells the client what to do svc_restart with the specified parameters;
7. The client in the ini file finds that svc_restart is a bat file lying in scripts;
8. A bat file with the specified parameters is executed;
9. Bat file restarts the service.

If something does not work:

1. Try the bat file itself, if it works, then;
2. On the server, go to the console and try the following:
cd / usr / local / nagios / libexec
./check_nrpe -H -p 5666 -c svc_restart -a spooler
This command will check if the command handler works on the client, if it works, then;
3. In the console, try
cd / usr / local / nagios / libexec
./svc_restart.sh CRITICAL spooler
This command will verify that the svc_restart.sh script is spelled correctly.

When writing, the Nagios manual from the official site was used. Unfortunately, there are no links left, but I think they are easy to google yourself.

Also popular now: