IPMI Technology Overview

To remotely manage the state of the server platform, system administrators and engineers use IPMI technology, which greatly simplifies their lives. Now you don’t have to run to the server every time to press the reset button - you can react to critical problems in a comfortable chair at home. This article will look at the main components of IPMI and the details of how the technology works.

What is IPMI

The abbreviation IPMI stands for Intelligent Platform Management Interface (Intelligent Platform Management Interface). Through IPMI, you can remotely connect to the server and control its operation:

Monitor the physical condition of the equipment, for example, check the temperature of individual components of the system, voltage levels, fan speed
Restore server operability in automatic or manual mode (remote reboot of the system, power on / off, download ISO images and software updates)
Control peripherals
Keep an event log
Store information about the equipment used

Suppose an engineer reconfigures a network on a server, makes a configuration error and loses access over SSH. How now to "reach out" to the server? You can connect via IPMI and change the settings.

IPMI is good because the functions listed above are available regardless of the processor, BIOS, or operating system (OS) of the managed platform. For example, you can remotely restart the server if the OS is frozen, or search for the cause of a CPU failure in the system event log. You can even control the server that is turned off - it is enough that the server is connected to the electrical network.

After the server is installed and connected to the network, Selectel engineers configure the BIOS and IPMI. Then you can exit the noisy server and continue to configure the equipment remotely. Once the initial configuration is completed, Selectel customers can manage the operation of dedicated servers and arbitrary configuration servers through IPMI.

History reference

The first version of the IPMI v1.0 specification was developed jointly by Intel, Dell, NEC and Hewlett-Packard in 1998. In practice, vulnerabilities and shortcomings were discovered that were fixed in subsequent versions of IPMI v1.5 and v2.0.

The IPMI specification standardizes the communication interface, rather than a specific implementation in hardware, so IPMI does not require the use of special patented devices and certain microcontrollers. Manufacturers, adhering to the specifications, develop their own IPMI equipment built into server platforms:

Manufacturer	IPMI technology
Cisco	Cisco IMC (Integrated Management Controller)
DELL	iDRAC (Integrated Dell Remote Access Card)
HP	iLO (Integrated Lights-Out)
Ibm	IMM (Integrated Management Module)
Lenovo	IMM (Integrated Management Module)
Supermicro	SIM (Supermicro Intelligent Management)

Companies set their prices for the technology provided. If the cost of selling IPMI increases, the cost of renting a server increases, as it directly depends on the cost of consumables.

Solutions manufacturers differ among themselves:

Visibility of equipment status
A unique set of applications to restore server health, if any component fails.
The ability to collect statistics on all server components, including those connected via PCI, NVM, etc. expansion cards
Using technology not only in server hardware, but also with conventional computers via PCI-Express expansion cards

In fact, for a comfortable work with a remote console and timely notification of problems, basic IPMI functionality is sufficient.

Although manufacturers provide modified and refined IPMI, the implementation of its architecture remains similar. Let us see what the technology consists of, based on the official specification of Intel.

The basic components of any IPMI

Management Controllers

In the center of the architecture is the “brain” of IPMI, the BMC microcontroller (Baseboard Management Controller). Through it, the remote control of the server takes place. In fact, BMC is a separate computer with its own software and network interface, which is decoupled on the motherboard or connected as an expansion card over the PCI management bus.

The BMC is powered by the standby voltage of the motherboard, that is, it always works, regardless of the server state.

Additional control controllers (Management Controllers, MCs) can be connected to the BMC to extend the capabilities of basic management. For example, while the main system is controlled by BMC functions, MCs are connected to monitor various subsystems: redundant power supplies, RAID drives, peripherals.

MCs are supplied as stand-alone cards, separate from the central BMC, so they are also called Satellite Controllers. There may be several additional controllers, but one central BMC.

The BMC controllers are connected via the IPMB (Intelligent Platform Management Bus) interface. IPMB is an I2C (Inter-Integrated Circuit) -based bus, through which the BMC redirects management commands to different parts of the architecture:

Communicates with optional controllers (MCs)
Reads sensor data (Sensors)
Appeals to non-volatile storage (Non-Volatile Storage)

The IPMI architecture is implemented in such a way that the remote administrator does not have direct access to the system components. For example, to receive data from sensors, the remote administrator sends a command to the BMC, and the BMC in turn refers to the sensors.

In addition to sending commands to the BMC, you can configure the controller to automatically perform actions using the following mechanisms:

PEF (Platform Event Filtering)	The BMC stores an event table with information about which events to respond to and what actions to take. When the BMC receives an event message, it compares the data with the table and selects how to respond to the event. The reaction includes actions such as shutting down, rebooting the system, generating an alert.
Watchdog timer	The timer is configured to perform an action after a specified period of time. Actions include shutting down, restarting the server, interrupting processes. If you set the timeout value to 0, the action will be executed immediately. Depending on the implementation, Watchdog may interrogate the system about the state once a given time interval. If the system does not respond (for example, when it hangs), the action is initiated
Firmware Firewall	Some BMC actions that are implemented in a stand-alone server may disrupt modular platforms (for example, a blade server). To prevent potential problems, the firewall allows the BMC to block settings, IPMI commands, and write operations from the system interface. The firewall also contains a set of commands through which you can find out which commands and control functions are available for a specific platform.

Configuring PEF and Watchdog Timer in server BIOS

Non-volatile storage

Non-volatile storage remains available even if the server's CPU fails, for example, via a local network; consists of three areas:

System Event Log (SEL) - System Event Log
Sensor Data Record (SDR) Repository - a repository that stores sensor data
Field Replaceable Units (FRUs) Info - inventory information on system modules

System modules generate (Event Generator) or receive (Event receiver) events. MCs act as event generators, and the BMC in architecture can fulfill both roles. The BMC receives event messages over the system interface and IPMB, then logs them to the System Event Log (SEL).

There are mandatory requirements for implementing SEL:

SEL stores at least 16 events
Information stored in the SEL can be accessed regardless of BMC access and the state of the managed platform.

IPMI commands allow reading and deleting SELs. Since SEL memory is limited, periodically the log should be checked and cleaned so that new events are recorded. In the BMC settings, you can configure the auto-clean SEL. Autocleaning on different platforms happens differently - erasing old entries to fill new ones, or clearing the whole story.

The event message carries information from the SDR Repository and FRU Info areas.

SDR records are data on the types and number of sensors, their ability to generate events, types of indications. SDRs also contain records of the number and type of devices connected to IPMB. SDR records are stored in a memory area called SDR Repository (Sensor Data Records Repository).

FRU records contain information on serial numbers and models of parts of various modules of the system - a processor, a memory card, an I / O board, and controllers.

FRU information can be provided via MC (using IPMI commands) or through access to non-volatile memory chips SEEPROM (Serial Electrically Erasable Programmable Read Only Memory) connected via the Private Management Bus. Controllers communicate via this bus via low-level I2C commands with devices that do not support IPMI commands.

Practical use

Suppose a client complains about server hangs, but the operating system logs are fine. We look at SEL - we see errors in one of the strips of RAM, indicating the information about the slot in which it is located. Change - the server starts to work like a clock.

Above, we have reviewed the basic modules of the IPMI architecture. Now let us turn to the structure of the transmitted commands and see which interfaces are connected to remotely.

IPMI command structure

IPMI transmits messages in request-response format. Requests are commands. Commands initiate actions and set values. The request-response format makes it possible to simultaneously communicate several controllers over a single bus.

IPMI messages contain a basic set of fields, uniform for all commands:

Network Function (NetFn) assigns the value of a cluster to the command to which the command belongs (chassis commands, events, storage, etc.)
The Request / Response Identifier field is needed to distinguish between requests and responses.
Requester's ID - information about the source of the message. For example, for IPMB, this information contains the LUN (Logical Unit Number) of the device.
Responder's ID addresses the request to the desired responder.
Command - unique within the Network Function commands
Data - additional parameters (for example, the data returned in the response)

In addition, the response is always transmitted Completion Code, which reports the result of the command. If an error occurred during the execution of the request, a nonzero code will be sent corresponding to the event.

Channels through which messages are transmitted can be divided into three categories with appropriate interfaces:

BMC - MCs, Sensors, Storage (IPMB)
BMC - Managed Platform (System Interface)
BMC - Remote Administrator (LAN, Serial Interface)

In this model, BMC can be perceived as a switch that interconnects the system interfaces (in the terminology of the specification - Bridging):

Serial ↔ IPMB
Serial System Interface
LAN ↔ IPMB
LAN ↔ System Interface
Serial ↔ PCI Management Bus
LAN ↔ PCI Management Bus
Other combinations, including Serial ↔ LAN

Upon delivery through different interfaces of the architecture, the basic set of fields is supplemented with channel numbers and frames. For example, IPMB adds address fields and fields to verify the integrity of the transmitted data, and the LAN encapsulates IPMI commands into UDP / IP packets.

Remote Access Interfaces

In the initial version of IPMI, the remote console was connected to the BMC module via the serial interface (Serial Interface). The IPMI v2.0 specification is based on the use of a network interface (LAN Interface).

The LAN interface is provided through a dedicated BMC network port with its own IP address. When transmitting via the LAN, IPMI messages go through several encapsulation steps:

IPMI messages are formed into IPMI Session packets (later in the article we will take a closer look at the formation of IPMI Session)
IPMI Session packets are encapsulated via Remote Control Control Protocol (RMCP)
RMCP packets are formed in UDP datagrams
Ethernet frames added

Encapsulation of IPMI messages during transmission over LAN

The serial interface for connecting a remote console to the BMC is no longer used, but it is needed to implement two functions:

Serial port sharing
Serial-over-LAN (SoL)

Serial Port Sharing is the ability to use a common serial connector between the BMC serial controllers and the managed system. Typically, Serial Port Sharing is used to implement the BIOS Console Redirection, that is, redirect the BIOS console to the BMC module.

Serial-over-LAN is needed to interact with system components that only understand the serial communication interface. You can also send commands directly to server devices (chips, cards, disks, and so on) from the server console. SoL is designed to work in conjunction with the Serial Port Sharing feature.

Session and authentication

For a LAN and serial interface, the start of IPMI messaging is preceded by the establishment of a session, during which IPMI Session data packets are formed.

Session establishment is user-specific authentication. The session must be activated before starting the transfer of IPMI messages according to the following algorithm:

Remote console requests authentication data from BMC
BMC sends the answer about the supported authentication types (none, password, MD2 and MD5 algorithms, etc.)
The remote console sends a command about the selected authentication type and sends the user login
If the user has channel access privileges, the BMC sends a response containing the session ID. Due to the ID assignment, several sessions can work simultaneously on one channel (according to the requirements of the specification - at least four simultaneous sessions)
Удаленная консоль посылает запрос активации сеанса. Запрос содержит ID сеанса и аутентификационную информацию (имя пользователя, пароль, ключи ― зависит от выбранного типа аутентификации)
BMC верифицирует информацию о пользователе, утверждает ID сеанса и посылает ответ об активации

Sessions are automatically terminated if no action is performed for the specified interval or the connection is broken.

Access to the BMC can be blocked by sending multiple requests to activate the session at the same time, then all resources will be used to track sessions that require activation. To prevent a possible attack, it is recommended to use the LRU (Last Recently Used) algorithm in the BMC implementation. The algorithm validates the session ID for the earliest session activation request. For example, a remote console is launched through a browser in a noVNC session. If you open multiple tabs with running sessions, text input will be available in the earliest open tab.

When IPMI becomes unavailable

IPMI helps to restore the server when it fails. However, it may happen that the remote control system becomes unavailable. IPMI failures can be divided into four categories:

At the network level. Broken ports, non-operational equipment, cable defect, poorly crimped twisted pair
At the software level. System bug, BMC module hang, need to update module firmware
At the level of "iron". Overheating, failure of critical components (memory, processor), system architecture defects
At the level of power. BMC power down or server power supply issues

These factors affect both IPMI and the server itself. The BMC module is a chip independent of the server, and the failure of this microcontroller indicates a server failure 90% of the time.

IPMI in practice

You can manage the server using IPMI through a web browser, utilities provided by manufacturers, and open source utilities.

The web interface of each IPMI implementation is different, but the access principle remains the same:

Enter the BMC port IP address in the address bar
Enter username and password. Sometimes this information is listed directly on the hardware.

At Selectel, we work with IPMI modules from Intel, Asus, and Supermicro. As an example, take a look at the Supermicro web interface:

The capabilities of the web interface are also implemented in the Supermicro IPMIView graphical utility:

To manage the hardware through the Linux console, an appropriate utility is installed (for example, Ipmitool for local and remote control or IPMICFG for local). Then, using the console commands, an IPMI device is added and the BMC is configured.

Selectel clients have access to IPMI for dedicated servers and arbitrary configuration servers . IPMI is implemented as a KVM console that runs in a noVNC session through the control panel . To do this, in the server information card, click on the console icon in the upper right corner:

The console opens in a browser and adjusts to the screen size. If desired, the console can be used even through a phone or tablet.

Session is interrupted if you exit the panel.

Conclusion

IPMI is a completely autonomous component of the server platform, which does not depend on the operating system, or on the BIOS, or on the server CPU.

Thanks to IPMI, server system maintenance costs are reduced, and the life of system administrators becomes easier. There is no need for constant presence near the equipment - its work is controlled remotely via the network.

In this article, we looked at the main components of IPMI. However, the details of the technology are extensive. Talented developers, relying on the specification, can create their own IPMI-equipment and open-source tools, simultaneously eliminating the shortcomings of the current specification and opening up new remote management capabilities.

Materials used in the article:

IPMI v2.0 Specification
Official documentation Supermicro

Tags: