AI systems optimize data center cooling

    A year ago, the global data centers consumed 2% of the total electricity generated on the planet. According to analysts, this figure will rise to 5% by 2020. At the same time about half of all this energy is spent on cooling. These costs are designed to reduce AI systems.

    Today we will talk about the latest developments in this area.

    / photo The National Archives (UK) CC

    Google project

    In 2016, DeepMind and Google developed an artificial intelligence system that monitors individual components of the data center. She gave administrators in the data center recommendations on how to optimize server power consumption. The solution has reduced energy consumption for the operation of cooling systems by 40% and reduced PUE by 15%.

    According to the data center operators, the hints of machine algorithms were useful in their work, but they took too much time to process. So Dan Fuffffinger (Dan Fuenffinger), one of Google’s engineers, suggestedFully transfer control of air conditioning systems to intelligent solutions. This should have relieved the data center operators, since those would have to carry out only fine-tuning and control the entire process.

    For the next two years, the company improved its AI system, and now it fully manages the cooling of the server rooms. For example, the machine algorithm "guessed" that in winter the cold air cools the water in the chillers more strongly, and used this to optimize the power consumption. This reduced energy costs by an additional 30%.

    Google believes that their development and its analogues will further help data center owners to reduce the cost of cooling systems at least twice and reduce CO2 emissions to the atmosphere.

    How it works

    Thousands of physical sensors monitor the entire cooling system in the company's data center. Data from them is fed to the input of the AI ​​system deployed in the cloud. This is a neural network of five hidden layers with 50 neurons each.

    It works with 19 different parameters, including the total load on the servers, the number of running water pumps, the humidity in the street and even the wind speed. Every five minutes, the system reads the sensors (approximately 184 thousand samples — 70% of them were needed to train the network, and the remaining 30% were used for cross-checking ) and uses them to optimize the PUE value.

    She builds a list of forecasts, how this or that change in the system will affect the power consumption of the data center and the temperature in the computer room. For example, a change in the temperature of the “cold” corridor can cause fluctuations in the load on chillers, heat exchangers and pumps, which, as a result, will lead to non-linear changes in equipment performance.

    From the list that has been compiled, the most effective actions are chosen that will reduce power consumption more strongly than others and will not lead to malfunctioning of the data center. Further, these instructions are sent back to the data center, where the local control system once again checks whether they meet safety requirements (and their implementation will not lead to irreparable consequences).

    Since the AI ​​systems shifted some of the responsibility for the smooth operation of services like Google Search, Gmail and YouTube, the developers have provided a number of protective measures. Among them are the algorithms for calculating the uncertainty index. For each of the billions of possible actions, the AI ​​system evaluates the credibility and immediately eliminates those that have a low indicator (that is, with a high probability of failure).

    Another method of protection was a two-level verification. The optimal actions calculated by MO algorithms are compared with the set of security policies prescribed by data center operators. Only if everything is in order, changes are made to the operation of air conditioning systems.

    At the same time, operators are always ready to disable the “automatic” mode and take control.

    Similar developments

    Google is not the only one who develops machine learning solutions for managing data center cooling systems. For example, Litbit is working on Dac technology for monitoring consumed computing resources and power.

    / photo reynermedia CC

    To monitor the state of the equipment Dac uses IoT-sensors. The system can “hear” ultrasonic frequencies and “sense” abnormal vibrations of the floor. By analyzing this data, Dac determines whether all the equipment is working properly. In the event of a problem, the system notifies administrators, generates a ticket in technical support and even disables the hardware (in a critical situation).

    A similar solution is created by Nlyte Software, which is merged.with the IBM Watson IoT team. Their system collects data on temperature, humidity, electricity consumption, equipment utilization in the data center, and gives engineers tips on optimizing workflows. The solution works with both cloud and on-premise infrastructure.

    The introduction of AI systems in data centers will go beyond the usual DCIM-solutions (software for monitoring data centers). Among the experts of the IT industry there is an opinion that soon most of the processes in the data center will be automated. As a result, administrators in data centers will be able to concentrate on other, more important tasks affecting the growth and development of companies.

    PS Materials on the topic from the First Corporate IaaS Blog:

    Also popular now: