IBM / Lenovo Servers and the Watchdog: Episode II
For more than half a year, I spent investigating a watchdog on IBM / Lenovo servers on Linux with hardware and software technical support from IBM. The beginning of this detective story was described in my article SLES 12, the watchdog and IBM / Lenovo servers . Now, it seems, the situation has been clarified, and you can give constructive recommendations to the happy owners of the IBM / Lenovo xSeries hardware.
So, first we repeat the brief educational program from the previous article. As part of server and industrial platforms, there is a special scheme - a watchdog. When activated, it starts to count down a preset time (for example, one minute). If you do not contact him again during this time, then at the end of the interval a hardware reload will be performed. If you turn, the interval begins to re-count. This is necessary in order to automatically restore the computer in the event of a freezing operating system or providing some important software service. Such a solution is mandatory in high availability (HA) clusters and other applications that require constant system availability. For computers with Intel architecture, several hardware watchdog timer interfaces are used, depending on the manufacturer of the system, the most common of them is Intel TCO (iTCO). On Linux, watchdog drivers are implemented as kernel modules that provide a programmatic interface to it in the form of the / dev / watchdog device.
This completes the description of well-known things, further facts are little reflected on the Internet and are not well known even by the technical support of manufacturers of equipment and software.
It is generally accepted that in equipment with Intel chipsets, including IBM's Intel servers, which are now manufactured by Lenovo, the Intel TCO hardware level and its supporting Linux kernel module iTCO_wdt are responsible for the interface to the watchdog timer. It should be noted here that, upon careful examination, the Intel TCO architecture itself has a rather significant drawback, namely, it turns out that the processor controls itself. Although theoretically, nothing should prevent a program running in SMM mode from always doing its job, but theoretically, the operating system should not hang, should it? Therefore, the presence of a single hardware vulnerability point for the processor as a program executor and for its own watchdog timer does not look very good if you intend to build a system with increased reliability.
However, I probably would never go into these details and would not even know about them if it were not for the fact that the iTCO_wdt driver was completely inoperable on IBM servers under SLES 12: the driver loads into memory, but the device / dev / watchdog is not created, and a small inconspicuous message remains in the system log: “iTCO_wdt: unable to reset NO_REBOOT flag, device disabled by hardware / BIOS”.
At first, I thought it was a regression in SLES 12 compared to SLES 11, since in / SLES 11 the device / dev / watchdog was available. However, through collaboration with IBM and SUSE, it turned out that everything is much worse. It turns out that in SLES 11, unlike SLES 12, the entry in the / dev / watchdog directory creates the kernel itself at boot time, and the watchdog driver simply clings to this entry. Therefore, in SLES 11, the iTCO watchdog timer is just as inoperative as in SLES 12, but it is much more difficult to notice, since its inoperability is masked by the presence of a non-functional / dev / watchdog.
I think it is unnecessary to add that no manipulations with the BIOS, IMM, AMM settings and other wonderful tricks that are abundant in xSeries do not affect the performance of Intel TCO.
Fortunately, after more than half a year of active work with IBM technical support in hardware and software, IBM managed to find one ancient manuscript dated 2008 . It turns out that Intel also has another architecture for working with a watchdog timer - IPMI watchdog, which is supported on the xSeries platform.
The essence of IPMI (Intelligent Platform Management Interface) is completely different than that of iTCO. In accordance with the IPMI architecture, somewhere on the motherboard there is a special controller - in fact, a separate computer - with its own processor, software, network interface and other gadgets, designed to monitor the operation parameters of the main computer equipment and able to respond to their changes in a given way. In IPMI description terminology, this controller is called BMC (Baseboard Management Controller) or simply MC. In IBM / Lenovo terminology, the device that implements its functions is called IMM (Integrated Management Module) or IMM2. BMC can do many different things, which are described in the mentioned manuscript, but for us now it’s essential, that one of its functions is a watchdog. It is clear that the IPMI watchdog timer is an honest device separate from the Intel processor, which, in general, works independently until the motherboard as a whole is out of order.
The description of the work with the watchdog timer in the manuscript is made in the genre of the authors' comment on a certain instruction MIGR-5069505 that has not reached us, and is based on the material of outdated software versions and their not always relevant capabilities. But to understand what is at stake is quite possible, and a brief updated content of this secret knowledge is presented below.
A pleasant surprise is that IPMI support is integrated into modern Linux distributions. This support itself consists of several components, of which three will interest us.
Firstly, it is the ipmi.service service, which provides the ability to communicate programs with BMC. In SLES 12, this service is installed and starts automatically. This can be checked like this:
systemctl status ipmi
and, if necessary, further, as usual:
systemctl start ipmi
systemctl enable ipmi
Secondly, this is the IPMI watchdog driver itself, which is called: ipmi_watchdog. It is installed automatically, but does not start automatically (apparently, it is believed that the administrator must be sure of the hardware settings before allowing its hardware reboot by timeout). You can download this driver manually using the command:
modprobe ipmi_watchdog
You can enable automatic loading at system startup by creating the ipmi_watchdog.conf file in the /etc/modules-load.d directory, which consists of one line “ipmi_watchdog”:
echo ipmi_watchdog> / etc / modules- load.d / ipmi_watchdog.conf
Thirdly, this is the ipmitool utility, which is installed automatically and allows you to execute various BMC commands, including, for example, checking the status of the watchdog timer:
ipmitool mc watchdog get
If you have a BMC on your system, you will receive something in response to the specified command something like:
Watchdog Timer Use: SMS / OS (0x04)
Watchdog Timer Is: Stopped
Watchdog Timer Actions: No action (0x00)
Pre-timeout interval: 0 seconds
Timer Expiration Flags: 0x00
Initial Countdown: 300 sec
Present Countdown: 300 sec
If, for example, a high-availability cluster is launched, then it will configure the correct settings for the watchdog timer (for example, in my system it is a period of 5 seconds and the Hard reset action).
Unfortunately, even the properly installed ipmi service and the ipmi_watchdog driver and the presence of the / dev / watchdog file do not guarantee that everything works as it should. What is the matter? It turns out that some versions of SLES 12 have a disgusting habit of downloading the softdog driver on their own initiative, trying to emulate the watchdog timer programmatically (an absolutely senseless and harmful exercise). And since softdog is uploaded to ipmi_watchdog, the latter, unable to create the / dev / watchdog file already created, traditionally does nothing, modestly muttering something into the bowels of the system log. Therefore, our last task is to look for a dog by
issuing the lsmod | grep dog
and analyzing its result. If we see ipmi_watchdog there and don’t see softdog, then most likely everything works for us correctly. If there is softdog, then you need to somehow get rid of it from the system, which in some versions of SLES 12 may not be completely trivial.
I assume that the health of the IPMI watchdog timer on IBM / Lenovo equipment can be related to the value of the OSWatchdog parameter set in the IMM module using the web interface or asu utility (asu64). This parameter can take a number of minutes or be turned off. I have it turned on for 2.5 minutes (the minimum value), but this does not affect the watchdog timer interval programmed in BMC.
So, the summary. Softdog, Intel TCO or IPMI may seem like the correct way to use the watchdog timer on an IBM / Lenovo platform, but in reality, only IPMI is functional. The IPMI watchdog driver is automatically installed in SLES, but requires a manual load registration. The softdog driver is installed automatically and sometimes requires manual disabling of downloads. The Intel TCO driver is installed and loaded automatically, but absolutely does not affect anything, since it is completely inoperative on this platform.
I hope that this article will help someone to understand a little more about the difficult business of organizing high availability systems under Linux.
So, first we repeat the brief educational program from the previous article. As part of server and industrial platforms, there is a special scheme - a watchdog. When activated, it starts to count down a preset time (for example, one minute). If you do not contact him again during this time, then at the end of the interval a hardware reload will be performed. If you turn, the interval begins to re-count. This is necessary in order to automatically restore the computer in the event of a freezing operating system or providing some important software service. Such a solution is mandatory in high availability (HA) clusters and other applications that require constant system availability. For computers with Intel architecture, several hardware watchdog timer interfaces are used, depending on the manufacturer of the system, the most common of them is Intel TCO (iTCO). On Linux, watchdog drivers are implemented as kernel modules that provide a programmatic interface to it in the form of the / dev / watchdog device.
This completes the description of well-known things, further facts are little reflected on the Internet and are not well known even by the technical support of manufacturers of equipment and software.
It is generally accepted that in equipment with Intel chipsets, including IBM's Intel servers, which are now manufactured by Lenovo, the Intel TCO hardware level and its supporting Linux kernel module iTCO_wdt are responsible for the interface to the watchdog timer. It should be noted here that, upon careful examination, the Intel TCO architecture itself has a rather significant drawback, namely, it turns out that the processor controls itself. Although theoretically, nothing should prevent a program running in SMM mode from always doing its job, but theoretically, the operating system should not hang, should it? Therefore, the presence of a single hardware vulnerability point for the processor as a program executor and for its own watchdog timer does not look very good if you intend to build a system with increased reliability.
However, I probably would never go into these details and would not even know about them if it were not for the fact that the iTCO_wdt driver was completely inoperable on IBM servers under SLES 12: the driver loads into memory, but the device / dev / watchdog is not created, and a small inconspicuous message remains in the system log: “iTCO_wdt: unable to reset NO_REBOOT flag, device disabled by hardware / BIOS”.
At first, I thought it was a regression in SLES 12 compared to SLES 11, since in / SLES 11 the device / dev / watchdog was available. However, through collaboration with IBM and SUSE, it turned out that everything is much worse. It turns out that in SLES 11, unlike SLES 12, the entry in the / dev / watchdog directory creates the kernel itself at boot time, and the watchdog driver simply clings to this entry. Therefore, in SLES 11, the iTCO watchdog timer is just as inoperative as in SLES 12, but it is much more difficult to notice, since its inoperability is masked by the presence of a non-functional / dev / watchdog.
I think it is unnecessary to add that no manipulations with the BIOS, IMM, AMM settings and other wonderful tricks that are abundant in xSeries do not affect the performance of Intel TCO.
Fortunately, after more than half a year of active work with IBM technical support in hardware and software, IBM managed to find one ancient manuscript dated 2008 . It turns out that Intel also has another architecture for working with a watchdog timer - IPMI watchdog, which is supported on the xSeries platform.
The essence of IPMI (Intelligent Platform Management Interface) is completely different than that of iTCO. In accordance with the IPMI architecture, somewhere on the motherboard there is a special controller - in fact, a separate computer - with its own processor, software, network interface and other gadgets, designed to monitor the operation parameters of the main computer equipment and able to respond to their changes in a given way. In IPMI description terminology, this controller is called BMC (Baseboard Management Controller) or simply MC. In IBM / Lenovo terminology, the device that implements its functions is called IMM (Integrated Management Module) or IMM2. BMC can do many different things, which are described in the mentioned manuscript, but for us now it’s essential, that one of its functions is a watchdog. It is clear that the IPMI watchdog timer is an honest device separate from the Intel processor, which, in general, works independently until the motherboard as a whole is out of order.
The description of the work with the watchdog timer in the manuscript is made in the genre of the authors' comment on a certain instruction MIGR-5069505 that has not reached us, and is based on the material of outdated software versions and their not always relevant capabilities. But to understand what is at stake is quite possible, and a brief updated content of this secret knowledge is presented below.
A pleasant surprise is that IPMI support is integrated into modern Linux distributions. This support itself consists of several components, of which three will interest us.
Firstly, it is the ipmi.service service, which provides the ability to communicate programs with BMC. In SLES 12, this service is installed and starts automatically. This can be checked like this:
systemctl status ipmi
and, if necessary, further, as usual:
systemctl start ipmi
systemctl enable ipmi
Secondly, this is the IPMI watchdog driver itself, which is called: ipmi_watchdog. It is installed automatically, but does not start automatically (apparently, it is believed that the administrator must be sure of the hardware settings before allowing its hardware reboot by timeout). You can download this driver manually using the command:
modprobe ipmi_watchdog
You can enable automatic loading at system startup by creating the ipmi_watchdog.conf file in the /etc/modules-load.d directory, which consists of one line “ipmi_watchdog”:
echo ipmi_watchdog> / etc / modules- load.d / ipmi_watchdog.conf
Thirdly, this is the ipmitool utility, which is installed automatically and allows you to execute various BMC commands, including, for example, checking the status of the watchdog timer:
ipmitool mc watchdog get
If you have a BMC on your system, you will receive something in response to the specified command something like:
Watchdog Timer Use: SMS / OS (0x04)
Watchdog Timer Is: Stopped
Watchdog Timer Actions: No action (0x00)
Pre-timeout interval: 0 seconds
Timer Expiration Flags: 0x00
Initial Countdown: 300 sec
Present Countdown: 300 sec
If, for example, a high-availability cluster is launched, then it will configure the correct settings for the watchdog timer (for example, in my system it is a period of 5 seconds and the Hard reset action).
Unfortunately, even the properly installed ipmi service and the ipmi_watchdog driver and the presence of the / dev / watchdog file do not guarantee that everything works as it should. What is the matter? It turns out that some versions of SLES 12 have a disgusting habit of downloading the softdog driver on their own initiative, trying to emulate the watchdog timer programmatically (an absolutely senseless and harmful exercise). And since softdog is uploaded to ipmi_watchdog, the latter, unable to create the / dev / watchdog file already created, traditionally does nothing, modestly muttering something into the bowels of the system log. Therefore, our last task is to look for a dog by
issuing the lsmod | grep dog
and analyzing its result. If we see ipmi_watchdog there and don’t see softdog, then most likely everything works for us correctly. If there is softdog, then you need to somehow get rid of it from the system, which in some versions of SLES 12 may not be completely trivial.
I assume that the health of the IPMI watchdog timer on IBM / Lenovo equipment can be related to the value of the OSWatchdog parameter set in the IMM module using the web interface or asu utility (asu64). This parameter can take a number of minutes or be turned off. I have it turned on for 2.5 minutes (the minimum value), but this does not affect the watchdog timer interval programmed in BMC.
So, the summary. Softdog, Intel TCO or IPMI may seem like the correct way to use the watchdog timer on an IBM / Lenovo platform, but in reality, only IPMI is functional. The IPMI watchdog driver is automatically installed in SLES, but requires a manual load registration. The softdog driver is installed automatically and sometimes requires manual disabling of downloads. The Intel TCO driver is installed and loaded automatically, but absolutely does not affect anything, since it is completely inoperative on this platform.
I hope that this article will help someone to understand a little more about the difficult business of organizing high availability systems under Linux.