As I taught Zabbix to look after my node and report problems

    Hi, Habr!

    I am currently working on a messenger project on the blockchain, along with a team of my colleagues. Who cares - see the links in the profile or ask in the comments.

    Blockchain development is a new and unexplored area, so sometimes you have to use very non-standard tools. Where there is a microscope and nails! Therefore, I decided to keep this blog to tell different interesting cases from the practice. Today's post is about how I set up instant notifications about the status of my node in order to quickly bring it back to life.



    The plan that I adhered to


    I set myself the following task: every time a node goes down or down, I should receive instant notifications about it. We live in a progressive century and are used to receiving all the important information instantly, right?

    I decided that in order to accomplish this task I would screw Zabbix to Slack (he is our working tool for the project). Zabbix, respectively, will monitor the node and send fault messages to me in lichku Slack'a.

    Implementation: step by step


    Step 1: Zabbix


    Of course, Zabbix does not have standard pre-configured monitoring tools for our node. Therefore, the first desire was to determine the availability of the node port using the net.tcp.listen[port].

    But key. There is one “but”: it happens that the node is active, listens on the port, but does not function correctly. And here I was confronted with the fact that it is necessary to determine the main feature of a node's performance.

    Noda what should do? Right, grow. That growth will be the main feature. So I decided to use the key system.run[command, mode].

    Together with curl -s http://127.0.0.1:36666/api/blocks/getHeight.

    As a result, we received a JSON format string

    {"success":true,"nodeTimestamp":XXXXXXX,"height":XXXXXXX}

    The jq package came to the rescue of the JSON parsing task (https://stedolan.github.io/jq/). A simple transfer of the result through the pipe curl http://127.0.0.1:36666/api/blocks/getHeight | jq .height, and instead of the long-awaited height, we received a response containing information on the execution of the command curl.



    The redundant information needed to be removed, and then the assistant came - the key -s, he - -silent. As a result, using the Zabbix-key, system.run[curl -s http://127.0.0.1:36666/api/blocks/getHeight | jq .height]we get the height of the node desired and convenient for monitoring the XXXXXXXX view.



    A trigger was also required to set up a scheduled alert. The plan was this: to compare the last and previous values, and that the trigger worked, if the growth is less than one.

    {ADAMANT Node Monitoring:system.run[curl -s http://127.0.0.1:36666/api/blocks/getHeight | jq .height].change()}<1

    Step 2. Zabbix to Slack




    The next task is to notify about the trigger trigger in Slack. I took the https://github.com/ericoc/zabbix-slack-alertscript material as a basis .

    The instruction is clear, but using emoticons to distinguish between Severity is not serious. Highlighting with color stripes is much more interesting. After processing the script, this remains:

     url='********************************'
    username='Server'
    to="$1"
    subject="$2"
    recoversub='^RECOVER(Y|ED)?$'if [[ "$subject" == 'Warning' ]]; then
    	color='#EBFF00'
    elif [ "$subject" == 'Not classified' ]; then
    	color='#D8E3FF'
    elif [ "$subject" == 'Information' ]; then
    	color='#0049FF'
    elif [ "$subject" == 'Average' ]; then
    	color='#FFC200'
    elif [ "$subject" == 'High' ]; then
    	color='#FF5500'
    elif [ "$subject" == 'Disaster' ]; then
    	color='#FF0000'else
    	color='#00FF06'
    fi
    message="${subject} \n $3"
    payload="payload={\"attachments\": [{\"color\": \"${color}\", \"text\": \"${message}\"}]}"
    curl -m 5 --data-urlencode "${payload}" $url 

    findings


    As morality - a couple of words, why convenient monitoring is so important. The sooner you learn about the situation, the faster you will correct it and the less pronounced the consequences will be. As they say, just raised does not count as fallen. And in Slack, among other things, there are group chats, so the team can connect to fix the problem and coordinate actions. By the way, our project is open source, and we treat with great respect to other open source projects. My experiment has once again shown that open source is good.

    Also popular now: