zim32 June 4, 2019 at 14:39

Poor man's monitoring or server monitoring from the console

Recovery mode

All welcome dear readers. In this article I will tell you about my “bike”, on which I do monitoring of various things without leaving the console.

I once encountered a situation where quite a lot of different projects and servers bred, and my hands didn’t reach setting up normal monitoring.

And in the modern world, “correct” monitoring implies the deployment of a whole bunch of software, the configuration of this whole thing. Well you know there ... docker, elastic stack and off it went. For me it was a strong overhead. I wanted one or two in production.

I looked towards Simple monitor on a python, it was closest to me in spirit, but it lacked quite a few features. And at the same time I wanted to learn Go ... well, in general, you yourself know how usually it all starts.

So I took the Go ~~welding~~ , and put together this Bike .

Cli Monitoring is written in Go and is a set of binaries, each of which receives data from stdin, performs some specific task and displays the result in stdout.

There are four types of binaries in total: metrics , processors , filters , and outputs .

Metrics , as the name implies, collect any data and usually go first in the chain.
The processors are in the middle and somehow change the data or perform other utility functions.
Filtersalmost like processors, but unlike them, they skip or do not skip data, depending on the condition.
The outputs are at the exit of the chain and are used to send notifications to various services.

The whole chain of commands usually looks like:

some_metric | processor_1 | processor_2 ... | cm_p_message | output_1 | output_2 ...

Any piece of this chain can be any Linux command, as long as it receives data in stdin and sends it to stdout without buffering. There is only one small BUT related to line breaks, but more on that later.

The name of the binaries is formed as cm_ {type} _ {name} , where type is one of three: m, p, f or o , and name is the name of the command.

For example, cm_m_cpu is a metric that outputs statistics on a processor in json format to stdout.

And the cm_p_debounce file is a processor that only lets out one message every time at a given interval.

There is one special cm_p_message processor that must be in front of the first output. It creates a message of the required format for subsequent processing by its Outputs.

To handle json in the console and various conditions, I used the jq utility . This is something like sed, only for json.

This is how, for example, CPU monitoring looks like in the end.

cm_m_cpu | cm_p_eot2nl | jq -cM --unbuffered 'if .LoadAvg1 > 1 then .LoadAvg1 else false end' | cm_p_nl2eot | cm_f_regex -e '\d+' | cm_p_debounce -i 60 | cm_p_message -m 'Load average is {stdin}' | cm_o_telegram

And so monitoring the number of messages in the RabbitMQ queue

while true; do rabbitmqctl list_queues -p queue_name | grep -Po --line-buffered '\d+'; sleep 60; done | jq -cM '. > 10000' --unbuffered | cm_p_nl2eot | cm_f_true | cm_p_message -m 'There are more than 10000 tasks in rabbit queue' | cm_o_opsgenie

So you can monitor that nothing has been written to the file in 10 seconds

tail -f out.log | cm_p_nl2eot | cm_p_watchdog -i 10 | cm_p_debounce -i 3600 | cm_p_message -m 'No write to out.log for 10 seconds' -s 'alert' | cm_o_telegram

Do not rush to close the screen, now we will analyze what is happening here in the first example.

1) The metric cm_m_cpu displays once a second (specified by the -i parameter, by default a second) strings in json format. For example, {"LoadAvg1": 2.0332031, "LoadAvg2": 1.9018555, "LoadAvg3": 1.8623047}

2) cm_p_nl2eot is one of the utility commands that converts the EOT character to the LF character. The fact is, in order to avoid problems with line wrapping, I decided to make sure that all my binaries read data up to the ascii character EOT (End of Transmission). This allows you to safely transfer multi-line data between teams.

Therefore, when any other commands are called, they should be surrounded as:
cm_p_eot2nl | any other team | cm_p_nl2eot.

3) This is followed by a call to the jq utility , which checks the LoadAvg1 field and if it is greater than 1, then displays it further, if less, displays false

4) Next, we need to throw the entire message false from the chain . To do this, we apply the cm_f_regex filter , which takes a string as input, matches it with a regular expression, and, in the case of a match, displays it further. Otherwise, the line is simply discarded.

You could also use regular grep, but firstly it buffers the output, and the full syntax becomes a little longer (grep --line-buffered), secondly cm_f_regex makes it very easy to display group matches. For example:

cm_f_regex -e '(\d+)-(\d+)' -o '{1}/{2}'

Converts line 123-345 to line 123/345

5)The cm_p_debounce processor , in this case, takes our LoadAvg1 value and displays it further down the chain only once every 60 seconds. This is necessary in order not to spam yourself. You can set any other interval.

6) Almost everything is ready. It remains only to form a message and send it to telegrams. The message is generated by the special cm_p_message command . It simply accepts a string as input, creates json with the Severity, Message, and other fields and then outputs it for output processing. If we did not pass the -m option to it, then stdin would be the message, i.e. millet number is our LoadAvg1. This is not very informative.

7) Team cm_o_telegramit simply sends the telegram received at the input. Telegram settings are stored in an ini file.

Configuration

All parameters that accept binaries can be specified in the ini file. Parameters specified by the command line argument take precedence over the ini file.

The format of the init file is: The ini file itself is selected in the following order: 1) The file cm.config.ini in the current working directory 2) The file /etc/cm/config.ini if the file from item 1 is not found

[global]

host_name=override host name for this machine


[telegram]

cid=....

token=....


[opsgenie]

apiToken=...

apiEndpoint=...

......


[debounce]

i=3600

Production

On a real server, I create a file, for example, cpu.sh, in which all the necessary chain of commands is written. Then in the crown I prescribe something like this:

*/5 * * * * flock -n /etc/cm/cpu.lock /etc/cm/cpu.sh > /dev/null

If something fell off flock will re-raise the command. And that’s all! The simplicity of which I was not so lacking.

This is such a tool, maybe someone will find it convenient. For me, the convenience is that there is no need to make a lot of unnecessary things just to monitor necessary things. And it’s all conveniently configured: it cloned the repository, added the path to the binaries in $ PATH and that’s it.

Please do not judge strictly. The tool was written for myself, the set of commands is not large yet. But I will be glad to any feedback and suggestions. Thank you all for your attention.

Tags:

Poor man's monitoring or server monitoring from the console

Configuration

Production

Also popular now: