Why HDDs become less likely to fail

    Damage to hard drives is one of the main reasons for server downtime in data centers. But recently, the number of HDD failures has been declining. Let’s explain why this happens. / photo William Warby CC BY




    A bit of retrospective


    Twenty years ago, a hard drive was one of the “weak points” of a computer or server. There is a known story with IBM Deskstar drives that failed even after a short use. These discs are considered one of the most unreliable commercial HDDs, for which they received the common nickname Deathstar ("Death Star").

    Deskstar has cast a long shadow over the hard drive industry. Many manufacturers have begun to voluntarily lower the warranty periods on their devices. In some cases, they decreased from three to one year. But over time, new technologies appeared that increased the reliability of the HDD. According to researchIn 2018, one of the largest Western cloud providers, the failure rate (AFR) of hard drives in its data center amounted to 1.25%. For comparison: in 2016 and 2017, the AFR value was 1.95% and 1.77%, respectively.

    The experts in the IT industry associate the reduction in the number of HDD failures with the development of technology both in the drives themselves and in the data centers. Consider some of these solutions.

    Helium chambers


    Some modern HDD manufacturers fill with helium. Helium density is seven times lower than air density. This feature reduces the friction force acting on moving components and reduces the force of gas flows, which affects the accuracy of the positioning of the read heads. Additionally, the technology eliminates the risk of corrosion of HDD elements, because the helium medium does not contain water vapor. All this increases the estimated life cycle of hard drives.

    According to a HGST study conducted several years ago and based on statistics from Netflix, Huawei and HP, the service life of helium disks is twice that of classic HDDs. For this reason, helium disk sales are growing year by year, and the devices themselvesincreasingly used in data centers of cloud providers.

    Improving data center conditions


    Another reason for improving the reliability of HDD experts in the industry called the improvement of conditions in data centers. The service life of hard drives is directly related to their ambient temperature. Seagate notes that a temperature of 30 ° C will be optimal. If it is above 50 ° C or below 5 ° C, the number of failures will increase significantly.

    Therefore, IT companies are developing new air conditioning solutions that maintain the optimal temperature in the server room. For example, Facebook introduced evaporative cooling technology for data centers. Water for the system is cooled in a special heat exchanger, evaporating through a special membrane layer. This liquid is then used to lower the temperature in the engine room.

    In addition to new cooling systems, solutions for their management are also being developed. In particular, based on machine learning. Such systems use sensors that collect temperature data outside and inside the data center. This information is then used by the control module to set the ventilation - it regulates the temperature by taking more or less air from the street.

    We wrote in more detail about how AI systems help cool data centers in one of our blog materials .

    Development of “internal” HDD technologies


    The number of HDD failures is also affected by humidity. It determines the height at which the read head can safely be located so as not to damage the magnetic surface. To solve this problem, disk manufacturers are introducing technologies that configure the movement of the head block depending on the operating conditions.

    An example of such a technology would be RV sensors, or rotational vibration sensors. Using their readings, the built-in control module changes the nature of the movement of the block of heads, in a special way redistributing vibration to the device body. Often, RV-sensors are found in drives designed to work in disk arrays from Seagate, Toshiba and Western Digital.


    / photo meanwhile dan PD

    On the reliability of alternative drives


    The main competitor to hard drives, including data centers, today are SSDs. According to statistics, the number of failures of SSDs is less than that of an HDD. However, with age, "solid-state workers" the number of errors in reading grows twice as fast. To solve this problem, SSD manufacturers are developing error correction methods that should increase the reliability and service life of devices.

    One of such methods is SSD refresh ( p. 32 ). If individual cells of the drive are not accessed for a long time, they begin to lose charge. This may result in loss of some data. Therefore, the drive controller from time to time reads information in unused cells, evaluates their current state and “recharges” them.

    Another technology that continues to be used in data centers is magnetic tape. In the Data Storage Trends report for 2018, the tape became the fourth most popular storage after HDD, SDD and cloud (we are talking about various options for storing company data, not storage methods as such). Magnetic tapes are used mainly due to their reliability: errors on this drive occur four to five orders of magnitude less than in HDD.

    At the same time, new technologies are still being developed to extend the life of the film. In 2017, IBM and Sony created magnetic tape, which was provided with an additional “lubricant” layer for protection. This layer reduces the risk of damage to the tape when it moves at a speed of 10 meters per second.

    There are more experimental storage technologies, the reliability of which, in theory, can significantly exceed the performance of classic drives. For example, great potential as a long-term carrier in the IT community is prophesied to DNA molecules .

    The creators of DNA repositories plan to seal the molecules in glass capsules, where they will be isolated from harmful environmental conditions. This will allow you to store digital data encoded in them for thousands of years without errors. Such a storage medium may become a reality in the coming years: Microsoft was planning to introduce DNA storage in one of its data centers .

    But such solutions are still experimental and not designed for widespread use. Therefore, while one of the most popular ways of storing information in data centers will remain hard drives. And given the fact that their reliability is growing, the HDD will stay with us for a long time.



    Resources from the First Corporate IaaS Blog:


    Posts from our Telegram channel:


    Also popular now: