Downtime compensation options from Google and Telstra

    Downtime (the time when the system is not working) data centers cause not only financial damage, but also harm the brand's reputation. Many reasons can cause data center downtime. It happens that the infrastructure is not able to cope with the load due to various defects (elements, interruptions in the operation of the central power grid, etc.). But the human factor is the cause of most of the errors that lead to a decrease in the security and reliability of the data center. According to statistics from WinMagic, which polled about a thousand data center operators, the majority of respondents (31%) consider employees with access to server farms to be the most serious threat to the logical security. It is curious that hacker attacks take only second place (30%).



    Google undertook to reimburse its customers up to 25% of their monthly expenses for the failure of the Google Compute Engine cloud, which lasted almost 20 minutes (considering that 99.9% uptime allows for unavailability of the service for no more than 45 minutes per month). According to the press release that was posted on the Google Cloud Platform web resource, network configuration changes were the root cause of the failure. When operators took over the changes, the configuration software found a conflict. Trying to rectify the situation, the system made an attempt to return to the previous configuration and came across an previously unknown error that led to a crash. Somehow it was possible to patch the hole only after 20 minutes, but the problem remained unsolved. Google developers had to work a lot on optimizing their systems.



    A similar story occurred in the data center of the Australian telecommunications company Telstra. That notorious human factor has disabled the entire data center. But unlike Google, it took almost four hours to fix the problems in Telstra. The mobile network of the telecommunications company has gone offline. According to the Sydney Morning Herald, the incident occurred because of the actions of the engineer who brought the failed network node to offline mode without first activating the backup node. These actions caused interruptions in the mobile network and left many clients without communication. The problem has affected many cities in Australia, including Brisbane, Sydney, Melbourne, Adelaide and Perth. During the correction of the problem, thousands of people expressed their dissatisfaction with the company's work in social networks.



    After the incident, a few more, less serious and long downtime occurred. The management of the company made a decision to compensate the inconvenience to customers. Telstra gave its subscribers a day of free unlimited mobile Internet (Free Data Day). At the end of the day, 2,686 TB of data was downloaded, which naturally led to network overload and lower download speeds.

    Simple data centers cause financial damage and hurt the company's reputation. Therefore, it is so important for operators, designers and data center builders to do everything in their power to minimize downtime. Of course, no one can guarantee safety by 100%, but if you use modern standards, prepare an action plan in case of unforeseen situations and do not forget about timely maintenance - the risk of downtime will be minimized.

    Also popular now: