How to manage data center power consumption

    The traditional way to monitor energy consumption in a data center is to use smart socket blocks, Power Distribution Units (PDUs).



    You can connect to each one via the network or through the console, see the consumption on a segment or on a separate outlet, great opportunities. Manufacturers include appropriate software packages or even entire specialized servers, for example, SPM from Server Technology .

    You can draw a beautiful diagram of the data center, the location of the servers and build different graphs. The price tag is attached to them appropriate, inhumane.

    You can see how much the data center consumes for them, but that's all. What if consumption wants to be controlled?

    Use the built-in server features!


    How the server monitors and controls power consumption

    A modern server allows you to watch power consumption through IPMI:
    ~> sudo ipmitool sdr Power Unit | 150W | ns
    Either through raw commands, but it is also possible.

    Such an opportunity appeared due to Intel Node Manager technology, which was introduced with the Nehalem platform, Xeon 5500 series. In turn, it relies on the capabilities of smart power supplies and control via the PMbus.



    The server-side capabilities are based on Intel Management Engine and BMC ( OCP can do without BMC ):


    Server service bus connections

    Server power consumption is controlled by changing the power states and throttling of the processor (P-states and T-states), it is also possible to set restrictions on memory consumption, but it appeared only in the last revision, along with E5 processors. Comparison of NM versions, RMLY - Xeon E5, BRLW - Xeon E3 Total that you can get:





    1. Monitoring consumption over time. Intel Intelligent Power Node Manager measures platform consumption with an acceptable margin of error of ± 10%. Data is collected through the Power Supply Management Interface (PSMI) in real time.
    2. Platform power limitation, power capping. The platform can be limited by consumption "from above" by setting a limit beyond which it will not consume. The policy is set through the IPMI / DCMI console and processor consumption is limited through work with P-states. The limitation works for processors and memory.
    3. Consumption of excessive consumption alerts. If you cannot meet the assigned target budget, a warning will be sent to the management console.


    In rack-mount solutions Open Compute Platform and Open CloudServer, you can limit consumption through rack controllers.

    Power Capping

    This is the most interesting and important technology in energy management.

    In the line of Intel processors there are processors with different thermal packages, there are also special models of the L series with reduced consumption. You get a processor that never goes beyond a certain power, but at the same time has lower performance at a similar or higher cost compared to conventional processors.

    Does it make sense?

    As testing on anandtech showed , no.

    Since physics cannot be fooled, the same amount of energy is required to perform the same amount of work (plus the difference due to the speed of transition of processors / memory to low-power states). The total consumed value will be such that during the operation of the server the difference in processor cost will not pay off.

    By setting the Power Cap to a server with a conventional processor, you will get the same result as when using models with reduced consumption.

    What is the Power Cap for?

    To smooth out consumption surges. And it’s convenient to limit not the server, but the whole rack. Servers are loaded unevenly, so part can take more than the average value, part less - the main thing is that the total does not go beyond. Performance will hardly suffer from this (unless the full load of all systems and TurboBoost is required), and rack consumption will drop. Most importantly, the consumption in the data center will become much more predictable and more systems can be put in one rack.

    The rationale for this thesis is clearly visible all from the same Anandtech testing. A server running TurboBoost processor acceleration technology provides noticeably shorter response times compared to a machine that is tightly bounded from above. A stand where the loading of systems is uneven will work better, and in the case of a long high load, it will give some deterioration in response time with stable system performance. In this case, space savings according to Intel may be 20% or higher.

    Of course, it makes no sense to limit the processors below their thermal package except in those cases when the response time does not matter for you and the average processor load is significantly lower than the maximum performance, but at the same time it is necessary to fit as many systems into the rack as possible.

    Total, we can summarize a number of advantages of using consumption management technologies:
    1. Increased rack density: server energy budget management depending on the actual load in the data center control system allows you to put up reserves of unused power on additional servers in the rack
    2. Maximizing performance during power and temperature surges: dynamic server power management allows you to allocate more resources to mission-critical tasks, reducing the performance of secondary
    3. Reduced power consumption: the cooling control system receives real data on the power consumption and server temperature, reducing performance depending on needs
    4. There is a possibility of balancing: power consumption and server temperature can be included in the cloud management system or virtual environment to balance the load between racks.



    Which tool to manage these functions conveniently?

    Intel DCM: Energy Director

    Intel provided the DataCenter Manager product, which was divided into two - DCM: Energy Director and DCM: Virtual KVM Gateway. In addition to monitoring functionality with a web-based interface, it is also a toolkit that can be integrated into your own development environment. Main Panel Features and Benefits DCM: Energy Director Easy to Install







    • Installation in minutes with minimal system requirements
    • Scan the network and add devices automatically or manually
    • User-friendly graphical interface with the simple addition of new racks, rows of racks and data center rooms for visualizing infrastructure

    Collects real-time consumption data
    • Out-of-Band data collection
    • Does not require access to the OS on the client
    • Collects statistics over a long period of time for trending and analysis.

    Analysis Tools
    • Identification of hot and cold zones in the data center helps reduce damage from excessive operation of cooling systems
    • Detection of underloaded systems that may take on additional load or be limited in consumption
    • Visualization of server power consumption to assess the impact of changes in power management policies on device consumption

    History storage
    • Statistics data is stored year (default)
    • Consumption and temperature data are summarized for viewing at the room, row of racks, and individual racks.
    • Data Export Integrates DCM with Third-Party Analysis Tools

    Warning and Management System
    • Generates alerts and sends them to other management tools
    • Adopts consumption policies to open the power reserve without sacrificing performance
    • Allows the use of consumption policies to reduce the risk of supplying excess power to the server


    Here's how it looks live: Adding discovered servers to the rack of the data center Creating an energy policy View of the data center Creating thresholds for response Optimization tips















    MonitoringMonitoring power consumption and input temperature with aggregation of data on racks, lines and rooms

    Physical or logical

    user groups Receive alerts about user events related to power and temperature

    Energy consumption calculation algorithm for legacy servers that do not have power monitoring

    Display asset tags and server serial number for a number of manufacturers
    Trend trackingLogging on power and temperature, requests for trending with filtration.

    For the purpose of capacity planning, data for one year is stored.
    The controlIntelligent patented mechanism group policies

    are supported simultaneously several active type power policy at different levels of the hierarchy

    Prioritize the load supported by both directive policy

    allows schedule policy enforcement including time power limitation of day and / or day of the week

    Supports power limit of groups dynamically
    adapting to varying load server

    Intel® Node Manager 2.0 supports the limitation of the power of memory and dynamic blur ix cores
    Lack of agentDoes not require installing any software agents on managed nodes
    Easy integration and coexistenceThe device accounting system pre-scans the used range of IP addresses.

    Application interfaces for programming high-level Web services description languages ​​(WSDL)

    can be allocated. It can be located on an independent management server or coexist on the same server with the ISV product.

    Power planning based on temperature conditions: simulation of input and output temperatures ( depends on the manufacturer)

    Air exhaust temperature sensor (depends on the manufacturer)
    ScalabilityThe ability to manage tens of thousands of servers
    SecuritySecured APIs

    Secure communication channels with managed nodes

    Encryption of all sensitive data


    DCM: Energy Director works on all servers of our production and is available for experimental and combat operation.

    Also popular now: