Zabbix - monitoring OSPF neighbors using SNMPv3 TRAPs, pain and despair

Technical task

There is a network of geographically scattered data centers with a VRF car and a constantly changing list of OSPF neighbors. You need to track them:

  1. State, make alarm if neighbor state is not FULL
  2. The number, that is, if the neighbor is gone, you also need to make an alarm

The monitoring system is already there - Zabbix 3.4, it is desirable to use it, Linux OS Debian 9.x

We try with a swoop

The protocol is widespread, the monitoring system is well-known, for sure I am not the first who wants to solve this problem and most likely it has already been solved.

We hammer in search of "zabbix ospf" and the first link leads to a template . Happiness is what - now I import it, brush it to fit my needs and everything will be okay.
We check how it works - everything seems to be fine, the states are monitored, but when the neighbor goes to the DOWN state, we receive a very informative message from Zabbix

No Such Instance currently exists at this OID

and info

The item isnot discovered anymore and will be deleted in29d 23h 57m (on2018-08-19 at 08:52)

What happened - the problem is old and well-known on the forums - when the OSPF neighbor disappears, then all OIDs associated with it are simply deleted on the network hardware.

Yes, there is a solution - create a nodata trigger, ok, create:

{Template - SNMPv3 - OSPF Discovery:ospfNbrState[{#SNMPVALUE}].nodata(120)}=0

We see in the dashboard:


Basically ... usable

But out of the box LLD only detects neighbors from the default VRF. Of course, this can be resolved using the SNMP context, but somehow I didn’t want to go that way at all - it is necessary to go through all the glands, each OSPF process or VRF to hammer in the context, then in the template you should make Discovery copies for each context, in general, there is a lot of fuss and when adding new OSPF processes a few places to change something. Of course, you can overlap with scripts and change everything through Zabbix API, but you didn’t want a special custom, but I wanted to use only the functionality built into Zabbix to the maximum. There is a mention of a certain CISCO-CONTEXT-MAPPING-MIB, from which you can pull out all the correspondences of contexts and OSPF / VRF, but I did not realize how to attach this design to the LLD and my case. If someone knows how to cook Zabbix so cool, then welcome to the comments, and better to a full-fledged separate article.

We try from the second swoop

After a couple of hours of searching on the Internet, a topic about SNMP TRAP has surfaced by hints in forums and from memory bins - this is when we do not survey the piece of iron, but the piece of iron itself sends information about a change in something. Yes, and hike support this good is in our monitoring system out of the box , the equipment also knows how to immediately and just for my case.

From the first lines the monitoring documentation confused with a long list:

The workflow of receiving a trap:
    1. snmptrapd receives a trap
    2. snmptrapd passes the trap to SNMPTT or calls Perl trap receiver
    3. SNMPTT or Perl trap receiver parses, formats and writes the trap to a file
    4. Zabbix SNMP trapper reads and parses the trap file
    5. For each trap Zabbix finds all “SNMP trapper” items with host interfaces matching the received trap address. Note that only the selected “IP” or “DNS” in host interfaceis used during the matching.
    6.Foreachfound item, the trap is compared to regexp in “snmptrap[regexp]”. The trap issetas the valueof all matched items. Ifno matching item isfoundand there is an “snmptrap.fallback” item, the trap issetas the valueof that.
    7.If the trap was notsetas the valueofany item, Zabbix bydefaultlogs the unmatched trap. (This is configured byLog unmatched SNMP traps” in Administration → General → Other.)

That is, one daemon takes a TRAP, sends it to another daemon, it parses it, puts it in the log with the required format and zabiks reads the log and decides what to do next. Somehow it already looks never easier than even to walk with your hands and draw an SNMP context everywhere, but oh well, let's try. We read attentively to the monitoring dock and understand that only with its help nothing can be set up, Zabbix generally has such a joke - the documentation describes the system’s features and nuances so minimally that it is rather more confusing than it is taught. Although they can be understood - the software is free, but somehow you have to earn money, but they also earn money on support. On the Internet there are articles describing how to set it up once or twice., but for one article I didn’t manage to set up inside and out, I had to collect information from various sources bit by bit. It's all the lyrics, drove to do hardcore.

We configure network piece of iron

Before you twist something on the host with monitoring, I strongly recommend that you first set up a network piece of hardware and make sure that TRAP really flies from the piece of hardware to the server - at first I did not check that I drank a lot of nerves, blood and time. I have a Cisco Nexus car at hand, so I’ll give examples for this series. Who has Catalyst, ASR, ASA and so on - excuse me, I’m not sunshine, I’m not warming everyone up, read the docks how to set it up themselves, the syntax will be similar, but with its own nuances.

snmp-server contact
snmp-serverlocation Room1
snmp-server source-interface traps loopback1

It is important later when configuring TRAP in Zabbix, so that the address from which the TRAP is sent is equal to the SNMP interface address in the host settings in Zabbix.

snmp-server user Zabbix network-operator auth sha string priv aes-128string

Use protocol version 3 wherever possible, in authPriv mode (encryption and authentication), it is not as difficult to configure as it seems. Forget about the 1 and 2 versions of the protocol - when an unexpected accident arrives because of the lack of encryption and essentially authentication in these versions of the protocol - it’s just a matter of time (the community line is transmitted in clear text, moreover, I regularly see that it is public / private). The network-operator parameter allows you to grant read-only rights to the user.

snmp-server host traps version 3 priv Zabbix
snmp-server host use-vrf default
snmp-server host loopback1
no snmp-serverenable traps ospf lsa
snmp-serverenable traps ospf
no snmp-serverenable traps entity entity_mib_change
no snmp-serverenable traps entity entity_module_status_change
no snmp-serverenable traps entity entity_power_status_change
no snmp-serverenable traps entity entity_module_inserted
no snmp-serverenable traps entity entity_module_removed
no snmp-serverenable traps entity entity_unrecognised_module
no snmp-serverenable traps entity entity_fan_status_change
no snmp-serverenable traps entity entity_power_out_change
no snmp-serverenable traps link linkDown
no snmp-serverenable traps link linkUp
no snmp-serverenable traps linkextended-linkDown
no snmp-serverenable traps linkextended-linkUp
no snmp-serverenable traps link cieLinkDown
no snmp-serverenable traps link cieLinkUp
no snmp-serverenable traps link connUnitPortStatusChange
no snmp-serverenable traps bfd session-up
no snmp-serverenable traps linkdelayed-link-state-changeno snmp-serverenable traps bfd session-down
no snmp-serverenable traps rf redundancy_framework
no snmp-serverenable traps license notify-license-expiry
no snmp-serverenable traps license notify-no-license-for-feature
no snmp-serverenable traps license notify-licensefile-missingno snmp-serverenable traps license notify-license-expiry-warningno snmp-serverenable traps upgrade UpgradeOpNotifyOnCompletion
no snmp-serverenable traps upgrade UpgradeJobStatusNotify
no snmp-serverenable traps rmon risingAlarm
no snmp-serverenable traps rmon fallingAlarm
no snmp-serverenable traps rmon hcRisingAlarm
no snmp-serverenable traps rmon hcFallingAlarm
no snmp-serverenable traps entity entity_sensor
no snmp-serverenable traps generic coldStart
no snmp-serverenable traps generic warmStart

I specifically turned off all TRAP except OSPF, so that when diagnosing why something is not working, I did not have to deduct a lot of unnecessary information from the debug.

How to check if TRAP is working - it's very simple - you need to break something. We start the sniffer on the host with monitoring:

root@dc-zbx:~# tcpdump -i bond0 udp port 162
tcpdump: verbose output suppressed, use -v or -vv forfull protocol decode
listening on bond0, link-type EN10MB (Ethernet), capture size262144bytes

We find living neighbors on a piece of iron:

SW# show ip ospf neighbors vrf all 
 OSPF Process ID 10 VRF default
 Total number of neighbors: 4
 Neighbor ID     Pri State            Up Time  Address         Interface -          01:47:17172.17.0.10      Vlan1427 -          18w1d       Vlan1426 -          5w0d      Vlan1473 -          3d00h      Vlan1404 
 OSPF Process ID 100 VRF OSPF100
 Total number of neighbors: 4
 Neighbor ID     Pri State            Up Time  Address         Interface -          5w0d      Vlan1474 -          13w3d      Vlan1479 -          13w3d      Vlan1477 -          3d00h      Vlan1405 
 OSPF Process ID 200 VRF Dia
 Total number of neighbors: 2
 Neighbor ID     Pri State            Up Time  Address         Interface -          17w2d      Vlan1450 -          17w0d      Vlan1452 
 OSPF Process ID 216 VRF Dev
 Total number of neighbors: 2
 Neighbor ID     Pri State            Up Time  Address         Interface -          18:59:5910.216.0.73     Vlan1641 -          18:59:5410.216.0.82     Vlan1643 

And drop someone

interfacevlan 1643

We see in the sniffer:

11:08:20.001942 IP >  F=ap U="Zabbix" [!scoped PDU]39_d1_7c_19_b3_d9_f8_31_32_8e_c9_39_c2_3a_db_d8_28_26_c6_0b_01_55_b6_fa_5e_f5_38_66_f9_6f_3f_c0_98_cb_57_93_5a_50_8e_50_90_79_f3_9b_ec_ec_d7_9f_e8_ac_f6_fd_79_ac_95_ff_71_73_32_70_52_66_a5_7d_b3_c4_39_d0_1c_7f_a6_38_ea_d7_61_c0_2f_12_ee_db_d9_07_40_8c_a8_48_57_e9_e5_56_12_3f_ec_f9_34_65_09_96_86_f6_d2_93_06_45_fa_95_ea_36_5a_82_2f_30_8f_02_03_59_07_5f_d8_a6_1c_f2_5a_be_7d_09_15_ef_05_00_83_fd_ea_ac_2a_3b_86_0f_86_e5_3b_93_3a_68_6d_33_99_e2_46_2b_9d_6a_1e_5d_9e_d9_93_56_51_5e_ff_9e_77_4c_cb

If you didn’t see anything in the sniffer, diagnose it, because otherwise there’s no point in continuing any further, you just won’t understand at which of the stages something isn’t working for you.
If there is no piece of iron on hand or you cannot touch production, then TRAP can be generated from any other car, for example like this:

snmptrap -v 1 -c neveruseme'.'''633'55' . s "teststring000"
snmptrap -v3 -l authPriv -u Zabbix -a SHA -A abyrvalg -x AES -X pechka -e 0x8000000001020305 linkUp.0


We will need packages in the system:

apt install snmp snmp-mibs-downloader snmpd snmptrapd snmptt

I did not focus on the Perl trap receiver, but chose SNMPTT for personal and subjective reasons. So, in the dock it is written:

1. snmptrapd receives a trap

It is necessary to begin with its settings, and not to climb right away to create an Item in the Zabbix face. Why is it so - you need to climb the same steps that goes TRAP. In the previous section, we made sure that TRAP arrives in principle from a piece of iron, now we will ensure that it is at least accepted by the first demon, snmptrapd. I remember setting up postfix + dovecot + something else there for a long time. And I spent about two weeks - there, too, one demon accepts a connection, another parsit letter, the third puts it in the queue, the fourth in the folder to the user, and so on, and nothing worked. And all because I set it up from the middle, then from the end, then from the beginning, but I had to start from telnet on port 25 and watch the debigging of the lichener </ lyric>

We climb in /etc/snmp/snmptrapd.conf and delete, and better comment there everything that we do not understand and do not care, leave one line


Stop the service

systemctl stop snmptrapd.service

Run in manual mode

root@dc-zbx:~# snmptrapd -f -Lo
NET-SNMP version5.7.3 AgentX subagent connected
NET-SNMP version5.7.3

Again we try to break OSPF as in the example above and see:

2018-07-2011:38:38 UNKNOWN [UDP: []:22095->[]:162]:
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (1355817272) 156 days, 22:09:32.72	SNMPv2-MIB::snmpTrapOID.0 = OID: OSPF-TRAP-MIB::ospfNbrStateChange	OSPF-MIB::ospfRouterId = IpAddress:	OSPF-MIB::ospfNbrIpAddr = IpAddress:	OSPF-MIB::ospfNbrAddressLessIndex = INTEGER: 0	OSPF-MIB::ospfNbrRtrId = IpAddress:	OSPF-MIB::ospfNbrState = INTEGER: down(1)

If we do not see, then we look for the reason why. If you want to have the same beautiful entries, and not a set OID of the form, then add the following to /etc/snmp/snmp.conf:

mibs +OSPF-MIB
mibs +OSPFV3-MIB
mibdirs +/usr/share/snmp/mibs/ietf/

And distort SNMPd

systemctl restart snmpd.service 

For more details on how to download the MIB files with the least pain and feed them to your SNMPd, you can read [here] (

Now we will fasten authentication, we climb again in /etc/snmp/snmptrapd.conf

traphandle default snmptthandler
#disableAuthorization yes#
createUser -e 0x80000009038d604a6a82a3 Zabbix SHA string AES
authuser log,execute,net Zabbix

-e 0x80000009038d604a6a82a3 is the engineID, you can look at it on the network hardware:

SW# shsnmpengineIDLocalSNMPengineID: [Hex] 80000009038F604D6A82A1[Dec] 128:040:000:109:003:140:096:079:106:131:160

We repeat the experiment again, but now we are still catching the debm about USM:

root@dc-zbx:~# snmptrapd -f -Lo -Dusm
registered debug token usm, 1
usmUser: created a new user Zabbix at 80000009038F 604F 6B 82 A5 
NET-SNMP version 5.7.3 AgentX subagent connected
NET-SNMP version 5.7.3
usm: USM processing begun...
usm: match on user Zabbix
usm: no match on engineID (80000009038F 604F 6B 82 A5 )
usm: match on user Zabbix
usm: Verification succeeded.
usm: USM processing completed.
2018-07-2011:50:07 UNKNOWN [UDP: []:22095->[]:162]:
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (1355886163) 156 days, 22:21:01.63	SNMPv2-MIB::snmpTrapOID.0 = OID: OSPF-TRAP-MIB::ospfNbrStateChange	OSPF-MIB::ospfRouterId = IpAddress:	OSPF-MIB::ospfNbrIpAddr = IpAddress:	OSPF-MIB::ospfNbrAddressLessIndex = INTEGER: 0	OSPF-MIB::ospfNbrRtrId = IpAddress:	OSPF-MIB::ospfNbrState = INTEGER: down(1)

If at this stage you see authorization errors in the debug, carefully check the engineID and that the users created on the piece of hardware match those we drew in the /etc/snmp/snmptrapd.conf configuration file. By the way, yes, for each piece of hardware you will have to create your user with your engineID, or make your hands the same on all pieces of hardware, if the pieces of iron allow you to do this.

I can see the line in debag:

usm: no match on engineID (80000009038F 604F 6B 82 A5 )

Why I did not understand this, although with all of this, TRAP is accepted and sent for further processing. If you know what I did wrong, please in the comments.

Now we take on SNMPTT - it has two ini and conf config files. In the first, we determine the operating parameters of the daemon itself, in the second, we determine the parameters for receiving and processing each specific ladder.

We climb into the /etc/snmp/snmptt.ini file and draw the following things:

mode = daemon
net_snmp_perl_enable = 1
date_time_format = %Y %m %d %H:%M:%S

The date and time format is a business, the main thing is to use the same everywhere.

log_file = /var/log/snmptt/snmptt.log
log_system_file = /var/log/snmptt/snmpttsystem.log
unknown_trap_log_enable = 1
unknown_trap_log_file = /var/log/snmptt/snmpttunknown.log

Why is the log not the same as in many articles on the Internet? Because in the dock it was said “If systemd parameter PrivateTmp is used, this file is unlikely to work in / tmp.” I don’t want to get up on the rake once again if I’m warned about this beforehand, so I’m changing to the normal path to the file right away.

Next, go to /etc/snmp/snmptt.conf, remove everything we don’t need and / or do not understand, leave only this:

EVENT ospfNbrStateChange ."OSPF" Normal
FORMAT ZBXTRAP $aA OSPF neighbor with IP addr $2 changed state to $5

In this form, because Zabbix will expect exactly this format in the log. Where are the $ 2 and $ 5, you can find if you look at the format of TRAP messages , looking:

ObjectospfNbrStateChangeOID ;

These Trap Components are the parameters that can be shoved into the log format in the order of $ 1, $ 2 ...

During fights with all this stuff, I noticed that after changing the SNMPTT settings, it was as if the changes did not apply. It turned out that after changing them, snmpt.serivce should not be restarted, but snmpd.service - this nuance decently drank my blood and drank nerves during debug.

Check that all the demons are running:

systemctl status snmpd snmptrapd snmptt

If everything is ok, try again to break OSPF and go to the log /var/log/snmptt/snmptt.log, it will be like this:

2018 07 19 15:10: "OSPF"
2018 07 19 15:12: "OSPF"
2018 07 19 15:12: "OSPF"
2018 07 19 15:22: "OSPF"
2018 07 19 15:25: "OSPF"

Those TRAPs that we have not configured in the /etc/snmp/snmptt.conf config will get into the /var/log/snmptt/snmpttunknown.log log, but only from the piece of hardware for which the correct user and engineID are configured in the same config. That is, from the left-hand glands of TRAP will be silently dropped if you want matan and debriefing, then here you have an unusually imputed net-snmp dock, there is still a good description of the difference between TRAP and INFORM, looking ahead, it is better to use INFORM, t. to. there is some kind of delivery control there, and it also works via SNMP via UDP.

And only now we climb to customize our monitoring.

Zabbix configuration

First of all, make sure that in the /etc/zabbix/zabbix_server.conf configuration file the monitor is set to the correct SNMPTT log and Zabbix launches at least one SNMP Trapper: First, I created the Item right on the host in order to quickly and easily catch special effects, I will write here Immediately how to create a Template, because it is the templates that should be used whenever possible. I'll show you the pictures, the copy-pasteing freebie is over, but I'll paint the places that need attention. Create a template: Here we just give a sane name. Create an Item Important - the key must be such that what is indicated in square brackets is what Zabbix will look for in the log, we set up the log format in /etc/snmp/snmptt.conf and wrote there :




EVENT ospfNbrStateChange ."OSPF" Normal
FORMAT ZBXTRAP $aA OSPF neighbor with IP addr $2 changed state to $5

Actually in the log is the magic word "OSPF" and appears:

2018 07 19 15:25: "OSPF"

We defined the date format in the /etc/snmp/snmptt.ini config:

date_time_format = %Y %m %d %H:%M:%S

What I wrote above - use any format that is convenient for you, the main thing is that it coincides in the right places.

Creating a Trigger


State for a neighbor can have several:

1 : down
2 : attempt
3 : init4 : twoWay
5 : exchangeStart
6 : exchange
7 : loading
8 : full

In general, it is not fundamentally what state the neighbor is in, if this state is not FULL, since in order to diagnose this, you will still have to go to the piece of iron, read the logs, enter some commands. So the trigger will be one and will be excited only when the state of the neighbor in the TRAP is not FULL.

Before hanging the template on a specific host, make sure that the host has the correct SNMP interface configured with the correct IP address, otherwise the traps will be in the /var/log/snmptt/snmptt.log log, but Zabbix will not “tie” them to the host. In this case, in the Zabbix server log /var/log/zabbix/zabbix_server.log there will be a message like:

19972:20180720:091722.896unmatchedtrapreceivedfrom "": 2018 07 20 09:17: "OSPF"

We go to the Latest data, we see the


Trigger also worked.


Now we will put two neighbors.


In the dashboard we see what happened two problems, this is good, and even two letters will arrive on this topic with a customized alert.

Everything is great, everything works, and here is the cherry on the cake at the end.

Сейчас мы берем и поднимаем одного соседа. При этом в дашборде исчезнут сразу обе проблемы. Это не баг, это фича. Такой нюанс я случайно заметил когда тестировал шаблон. В итоге получается, что если у нас упадет несколько соседей, а потом один из них поднимется, или даже если поднимется сосед, которого раньше вовсе не существовало, то мониторинг позеленеет.
Конечно, можно Item руками настроить чтобы отслеживать конкретного соседа, можно еще чего-нибудь скриптануть, можно вернуться к SNMP контекстам из самого начала статьи. Еще есть мысль нарисовать скрипт, который будет ходить по SSH/API на сетевые железки, собирать инфу обо всех соседях, делать «рабочий» слепок, анализировать diff между проверками и писать в лог что не так, далее лог можно скормить мониторилке… сложно. Хотелось то минимум костылей и кастома. Если вы знаете вменяемый способ решения этой задачи или считаете что я всё сделал неправильно, опять же прошу в комментарии, а лучше в ответную статью.

UPD: colleagues advised to still understand and try to implement their plans using SNMP contexts . There is demand, there will be supply. Looking ahead, I can say - the devil is not so terrible, let's go.
On the network hardware we draw a magic command:

snmp-server context {snmp context name} instance {protocol instance} vrf {vrf name}

The names of the parameters require explanations
{snmp context name} - the name of the SNMP context that we will use in the requests.
{protocol instance} and {vrf name} are taken from the config of the configured OSPF process:

router ospf {protocol instance}
vrf {vrf name}

There was a fear that after such settings, we already break the configured Item via SNMP with an empty context, but checked that the setting affects only the OSPF-MIB data output, for example, everything from the IF-MIB section continues to be given as before with an empty context. If you do not have a Nexus, I recommend checking this point once again - it is likely that the behavior will be different.

Now we will twist the template in Zabbix.
It is necessary to create a new Discovery rule with an indication of the context:


New Item prototype, also with an indication of the context.


And two triggers - the first for alarma if the neighbor is in any state except FULL:


and the second - if the neighbor has disappeared:


Also popular now: