As I read the sensor readings via SNMP (Python + AgentX + systemd + Raspberry Pi) and built another monitor

    Hello.

    image

    Lyrical digression
    The article has been in the draft for a couple of weeks, because there was no time to finish the described object. But under the onslaught of the comrades, who already covered half of what I wanted to say with their articles, I decided to follow the principle of “release fast, release early , release crap ” and publish what is. Moreover, the development is 80% complete.

    Since the publication of the article about "To the best of the Universal Control Unit" a lot of time has passed (and to be precise, more than a year). A lot, but not enough, for me to write a normal software stuffing for this device. After all, it’s not for beauty, it should collect data from sensors and make sure that this data appears in the monitoring system (in my case, Zabbix)

    Part one - software


    Over the past time, the following has been implemented from the software filling:

    • Test script to demonstrate that everything connected is working
    • Script for zabbiksa to collect readings from temperature sensors

    There were attempts to write separate monitors for ntpd and for gpsd. A lot of time was spent on a super-monitor, which was supposed to be able to read the config, start the data collection processes from various sources according to the config, collect data from these processes and display readings on the screen, while allowing the zabbix to read this data. In fact, it turned out to implement a process manager that read the config and produced the necessary processes, and the drawing on the screen, which turned out to be very cool, even knows how to read layout from the config and change the contents of the screen by timer, while collecting data from the processes at the moment when they are needed. There is only one thing in this super-monitor - the actual processes that would collect data. Well, plus there were ideas to create a signal system, so that to assign functions to buttons,

    For a while, I scored on the development of a full-fledged software filling. Nenuacho, the script works, and the rule "works - do not touch", as they say, the holy rule of the administrator. But it’s bad luck - the more you want to monitor, the more scripts you need to write and the more exceptions you need to add to SELinux for zabbix (I’m not only monitoring raspi) - in the default policy, zabbix (like rsyslog, for example) is not allowed to call arbitrary programs, and this is understandable. I really did not want to disable SELinux for zabbiks at all or write my own policy for each binary that would twitch. Therefore, I had to think.

    And let's figure out how to collect data into a monitoring system:

    • According to who the initiator is:

      • Active monitoring - the monitored node initiates data transfer (push)
      • Passive monitoring - the monitoring node initiates data transfer (pull)

    • By data collection method:

      • Via an agent on a monitored node, using only agent-supported metrics
      • Through the agent on the observed node, expanding the agent with scripts
      • SNMP
      • Primitive ping
      • By telnet
      • … etc

    I use pull monitoring, not for religious reasons, but just happened. In fact, there is not much difference between push and pull, especially at light loads (in one of the previous works I did Nagios + NSCA, I did not notice a big difference, I still need to create elements with my hands). It would be possible to use zabbix_sender if I already had push monitoring, but it is not there, and there is no trial, and it’s somehow messy to interfere with one another. But in the question of which protocol to monitor, it seems that the choice is large, but not very - discovery is supported only through the agent or through SNMP, which leaves us with only two options. The agent disappears due to the described problem with SELinux. Voila, we still have pull monitoring through SNMP.

    Hurray! Why cheers? In Linux, it seems like there is snmpd, but how to make it give back what we need, but what does snmpd have no idea about? It turns out that snmpd has as many as 3 (fundamentally different) ways to give arbitrary data by arbitrary OIDs:

    • Running an external script (exec / sh / execfix / extend / extendfix / pass / pass_persist directives) is bad because of potential problems with SELinux and because an uncontrolled bunch of scripts will end up in a dump. Yes, they say pass_persist is not good with binary data transfer. I don’t know, maybe they shamelessly lie, but I didn’t like the idea of ​​producing a million scripts anyway;

    • Write something on the built-in perl or download .so - I don’t know and don’t want to know pearl, I don’t want to write so-shki, I’m not a programmer to write in C;

    • Get data from an external agent (proxy, SMUX, AgentX) - but it sounds good, loose coupling is independent of the language. Let's understand:

      • proxy - request OID from SNMP agent on the specified host. It is necessary to implement the entire SNMP protocol, which is completely useless to me, and why ask for something from another node, use the network when I want to receive data locally. I know about the existence of 127.0.0.1, but in any case, SNMP does not smile at all;

      • SMUX - you need support for the smux protocol in the calling agent as well, and man says that by default net-snmp is built without smux support (and already ntpd is rebuilt to support pps, it does not smile to rebuild net-snmp on raspi either). Yes, and this smux is just a wrapper for SNMP packets, just added the ability for a subagent to register on an agent;

      • AgentX is essentially the same as SMUX, only the protocol is simpler and the package is lighter. Well, it is compiled by default in net-snmp, which is also nice. Sounds like our choice.

    I write in python, so I went looking for whether anyone had already implemented the agentx protocol. And there were such good people - https://github.com/rayed/pyagentx and https://github.com/pief/python-netsnmpagent. The second project seems to be livelier, but the first one seemed simpler. I started with the first one (pyagentx), it works and does everything that is needed. But when I began to think about how to transfer data to this library, I wanted to deal with the second package (python-netsnmpagent). The problem with pyagentx is that the way it is written, it cannot receive data from calling functions, and therefore, the request for fresh data must occur directly in the function that sends updates to snmpd, which is not always convenient and not always possible. It was possible, of course, to rebuild something of your own and redefine the functions, but in fact you would have to rewrite the class almost entirely, which you also didn’t want to do - we’re developing on the knee, everything should be simple and quick.

    The next question was - what should the architecture look like? An attempt to write a dispatcher forking data sources and reading data from them was already and did not end very well (see above), so it was decided to abandon the dispatcher implementation. And it so happened that either I saw an article about systemd somewhere, or I just once again tickled an old desire to deal with it closer, and I decided that in my case the dispatcher would be systemd. Haters gonna hate, and we'll figure it out if it is already on raspi out of the box.

    What are the useful features of systemd for myself I found:

    • Free demonization - we write a service unit with the type simple (or notify) and we get a daemon without writing a single line of code for this. Goodbye python-daemon and / or daemonize
    • Automatically restarting fallen units - well, here comments are unnecessary, it saves from inconsistent errors
    • Socket activation and socket management in general is very nice when someone who wants to write to the socket can do this, even if someone who reads from the socket is not ready to do it yet. Moreover, the reader can be activated upon writing to the socket, which can save some RAM (however, not that it wasn’t enough ...)
    • Template units - if I have many identical sensors, you can spawn many processes from one unit, pass different parameters to everyone and enjoy
    • (detected too late, not yet implemented) units-timers - allow you to periodically run a unit. Why not cron - because cron has a minimum period of 1 minute, and I want to poll sensors more often. Why not sleep () - because the wait is active and because the period starts to drift - yes, we pull the sensor every N seconds, but taking into account reading and processing data, the data update period will not be N seconds, but N + x, that is, for each reading the data update period will move to x

    With these findings in mind, architecture was drawn:

    • systemd opens a socket for communication between sensor processes and the collector process, all sensor processes write to the same socket
    • systemd launches units for sensor processes
    • The sensor process reads data from the sensor, writes it to the socket and falls asleep (I did not find the systemd timer unit at that time)
    • as soon as the data from some sensor is written to the socket, systemd starts the process-collector, which receives the update from the sensor, magically processes it and saves it in the internal state. The process collector does not die
    • the collector process generates a separate thread (it is a thread, not a process, in order to avoid IPC between processes, which is somewhat sad in python for this task , I will write below why I think so), in which the internal state is transferred to snmpd using agentx protocol

    One very bad place is the shared internal state between the collector stream and agentx stream. But I forgave myself for this, because python has a magic GIL that solves the issue of synchronization between two threads. Although this, of course, is very bad and not according to the book. There was an idea to move the shared state to a separate process and make the agentx process and the collector process work with the state process through a socket, but it made me want to make another socket, write another unit, and so on.

    Why I did not like IPC in python in relation to this task :

    • Queue works fine, but these queues are unnamed; the Queue instance must be passed to the forked process. In my case, this means completely rewriting pyagentx
    • Manager might solve my problem, but then again, this means completely rewriting pyagentx
    • posix / sysv ipc is great, there are named queues there, but these queues are limited in size, on some systems - very poorly limited (they write [scroll to “Usage tips”] , which on macos, for example, does not exceed 2KB per queue and even configure impossible). Not that I had to run on a bunch of different systems with varying degrees of wretchedness by the implementation of sysv ipc, but I also did not want to do tuning. I want to immediately and well
    • posix / sysv ipc again - the queues are blocking, that is, there must be some minimum timeout before reading from the queue returns “empty”. In the case of pyagentx, blocking on reading from the queue in update () is very undesirable, and generally it’s wretched
    • and posix / sysv ipc again - a problem with naming queues. Although message queues are named, they are not named by the name, but by the key. Since the key is not hierarchical or semantically obvious, it is easy to choose a non-unique key. In the implementation of posix / sysv ipc for python, it is possible to generate the queue key automatically, but the problem is - if I could pass something to pyagentx, I would pass Queue there and not be tormented. You can generate a key using ftok, but they write [flipping to “Usage tips”] that ftok gives no more confidence in the uniqueness of a key thanint random() {return 4;}
    • (nothing more came to mind that an external queue broker would not involve, and the task is not so much as to keep a queue broker - an extra service, an extra headache)

    dbus looked like a solution to all troubles, and it exists wherever systemd is, but the trouble is that pydbus requires GLib> = 2.46 to publish the API, and in raspbian only 2.42. dbus-python is deprecated and unsupported. In short, until the roasted rooster in the ass bites, I will share the state in an unsafe way.

    When using SNMP for your dirty purposes, there’s another catch - but how to choose OIDs for your data sets? To do this, there is a special branch in private called enterprises - .1.3.6.1.4.1.. You can get a unique enterprise ID from IANA . When the OID scheme is defined, it would be nice to write MIB, so as not to forget where, what, well, and to make monitoring systems easier. An introduction to writing MIBs is here .

    At some point, I discovered ntpsnmpd with the corresponding MIB and was delighted until the bald head, but when I compiled this miracle, I found that the author only bothered to implement several top-level constants and ran out of steam. I digged a bit in the code and so on to the end and did not understand how the author interacted cunningly with ntpd (or ntpq) to pull out those constants without worrying about the output. I understood one thing for sure - there is no ready-made python API, which means there is nothing to catch, you have to implement this MIB yourself.

    Five Minute of Hate
    No, well, it’s true, haven’t anyone written analogs of ntpd, smartctl, lm_sensors and other utilities without an API for all these years? Nobody screwed snmp agents to them? Such analogues, so that you do not have to parse the text output? No, I understand Unixway and all that, but that’s not the case. Well, it would be possible to output data in a machine-readable format, but no, everything is only for people. And judging by the cries on the Internet (Russian and foreign), I am not the only one so unhappy. Well, let's say lm_sensors can be forgiven, because the same data can be subtracted from sysfs in a machine-readable format, but the rest?

    In general, this whole design works and is very tenacious. Discovery works in zabbix, items are created, graphics are drawn, triggers send alerts - what else is needed for happiness? The code is not finalized yet, so I won’t publish it.

    Part two - hardware


    It’s not always possible to screw up a unit case, but you don’t feel like hanging snot on the walls either. There is a very elegant solution - a DIN rail. Heaps of constructs with a rail are sold on the market, in which it is possible to put a rack-mount power supply (I use MeanWell DR-15-5), and all sorts of automatic machines, which are just right. Accordingly, I wanted a case on a DIN rail for raspi. These two comrades were considered as candidates - a model from Italtronic and RasPiBox. The advantage of RasPiBox is that there is already a prototyping board there and power is supplied through screw contacts (through the stabilizer on the GPIO), which is convenient, but can be unsafe. But it costs more than 3 times more expensive, takes up more space on the rail and does not have a transparent window. The Italtronic model is also not ideal - its width is such that all ready-made 16x2 LCD screens do not fit there in width, that is, the value of the transparent window drops sharply, but for the low price I was ready to forgive this shortcoming.

    The case turned out to be quite convenient, has space for mounting (or rather, for installation) of two printed circuit boards or a sheet of anything. I make substrates from acrylic wrapped in a non-conductive ESD protective film, a saw with a dremel:



    The boards inside are held only by the friction force and on small ledges on both sides, that is, no rigid fastening is provided inside. Despite the apparent size, the case is small and there is not much space left over the raspi itself, especially if you insert the board at the lower level. And I need a board, since I need to place several LEDs and a board with RTC.

    Case Photos







    I want to connect temperature and humidity sensors to the new monitor. For temperature, our choice is ds18b20, it works, but it is worth comparing the readings with a trusted thermometer, the sensor can lie half a degree according to the specification. To compensate, I added a primitive correction of the readings for the constant in the config, I checked it with such a thermometer:



    It turned out that my ds18b20 instances didn’t lie to themselves. But the next sensor is just lying, and as much as 0.6 degrees. However, again, it depends on the instance - one lied, the other almost did not lie.

    With humidity, it was not so simple. Cheap or does not work with raspi at all (because it is analog), or there are no libraries (I want it to be good right away), or expensive like aviation cables. A trade-off between convenience and toad was found in the Adafruit BME280, which also shows the temperature with pressure as a bonus (but it can lie, as I noted above).

    If ds18b20 can simply be wrapped in heat shrink and rejoice, with BME280 such a focus will not work. There were a lot of ideas about the case - and leave it as it is, solder the wires and fill them with glue nozzles (there are already ears for fastening, it turns out), and make a mini-case from the same acrylic as the substrate for the components, and fetch something with A 3D printer, since there is one in reach ... But then I remembered the eggs:



    This is an ideal case. There is enough space for the sensor, you can put a connector, convenient access for maintenance, you can hang it everywhere or just throw it somewhere.

    I decided to connect sensors to raspi through DB9. There are few USB lines, the RJ45 socket does not fit in dimensions. I decided to connect the egg-sensor via USB, because the remnants of cut USB cables were found in the cabinet - the good was not lost:



    To protect the GPIO comb on raspi and for the convenience of disassembling the case, I took another comb and soldered to it. The comb is angular, which gave a little more space vertically, but I did not calculate a little and this comb buried itself in resistors for LEDs. Of course, everything is tightly wrapped in heat shrink, but a moment that is worth remembering in the future. As a result, the halves of the case can still be separated to, for example, change the battery in rtc or raspi itself. Everything else (more precisely, a flash drive) is available for replacement without opening the case.

    Photos of half-preparedness and readiness






    One recommendation - do not save on buttons. I’ve saved it, so the button not only rattles (you can deal with this, the RPi.GPIO library provides protection against rattling), but it also works only in a very specific position. I provided a button for programmatically turning off the device in case you need to turn off the power (I’ve killed the FS several times on a USB flash drive by a sloppy shutdown), but it turned out that there wasn’t much to provide for - I also had to read the documentation. If you, like me, do not read the documentation, then know - overlay gpio_shutdown does not at all what you might expect, but only exposes a high / low level on some pin when disconnected, so that, for example, the external power supply can go out . In order to disable raspi by button, there is a core module rpi_power_switch(but you need to compile it, but kernel-headers are needed for this) or the Adafruit-GPIO-Halt user daemon . I will have my own hostd, which will blink LEDs, but at the same time it will respond to the button.

    Conclusion


    The result was a hardware-software complex for monitoring, expandable, using current technology, resistant to failures. Parts of the software can be updated and restarted independently of the other parts (thanks to systemd, this did not require any effort from me as a developer). And most importantly - it turned out to get a lot of pleasure from the process and the result. Well, a little cart of new knowledge has been added.

    Thank you for reading!

    Also popular now: