Delta lost $ 150 million due to desire of manufacturer of emergency generators for data centers



Last year, Delta airline lost more than $ 150 million. The cause of the losses was a malfunction in the Delta data center, which we wrote about at one time . We are talking about the company Delta Air Lines, whose many thousands of passengers could not fly away due to a failure in the DC located in Atlanta, USA. Like almost any company, Delta Air Lines data centers have backup systems that start to work if something goes wrong. Tens of millions of US dollars were invested in backup systems, but at the right time they simply did not work properly.

Then there was no switching from the main power system to the auxiliary generator, and the servers simply turned off after the UPS was discharged. This incident affected the performance of the company's DC. What happened already this month, almost a year later, was analyzed by Amazon Web Services Vice President John Hamilton. In particular, he said that the problem arose due to several consecutive rare failures. But, according to him, this happens much more often than is commonly thought.

That very rare combination of circumstances in his career has already happened twice, and the case in Delta is already the third. Moreover, this particular case is the most indicative. Firstly, its negative effect is quite high. Secondly, the incident has already been analyzed and sorted out, and thirdly, all this really does not happen so often, so few people have time to prepare for the onset of the “X-hour”.

To begin with, it’s worth remembering that Delta had to cancel 1000 flights on one day at once, 775 on the next day and 90 on the next day. As mentioned above, the company lost about $ 150 million, although airlines already do not have very high profitability, so it will be possible to make up for the loss only in a few years.

By the way, problems in data centers happen much more often than they are talked about. It’s just that in this particular case everything came out, the airline, with all the desire, could not conceal anything.

But what ever happened? The report said that "the mechanism for switching the main power supply to emergency power failed, as a result of which the backup system did not turn on." In order to better understand the nature of the problem, it is worth remembering what equipment is usually used to switch.

In a normal situation, electricity enters the DC through medium-voltage transformers and automation to uninterruptible power supplies, which are the ultimate source of power for critical equipment such as servers, data warehouses and network equipment. In the same usual situation, automation usually only monitors the quality of the supplied energy.


An employee of Delta Airlines helps the passenger whose flight has been canceled to understand the situation.

If the automation detects a failure, it waits a few seconds (in most cases) to normalize the situation. If there is no energy or its parameters are not what is required, emergency generators come into operation. A few seconds are also enough to enter the generator. As soon as it enters into the optimal mode, and all the parameters of the generated energy correspond to the set ones, the network switches to the generator, disconnecting from the main power source. In the course of these few seconds, which the automation needs to assess the situation and further actions, uninterrupted batteries give the necessary current - in this case, they can not do without them. As soon as the main source "comes to its senses", the reverse switching occurs.

In most cases, everything goes as it should. Problems are so rare that the vast majority of companies never face the failure of automation in the energy infrastructure. But if automation fails, then the company may face problems and losses, as in the case of Delta. How can she let you down? The fact is that generator manufacturers use special software that monitors the voltage in the network during a failure. If it is too high or the automation doesn’t like something else, then the generator simply does not turn on. The fact is that its cost can reach a million dollars or even higher, and the equipment manufacturer believes that the best way out is not to risk the generator.

But in some cases, a million dollars is nothing compared to the total loss from a failure, so data center engineers may prefer to run the generator, even if it is likely to be damaged. In the case of Delta Airlines, the technicians could not do anything, because the automation decided to block the expensive generator (in the beginning it was said in vain that several tens of millions of US dollars were invested in the backup system). 5-10 minutes, and the UPS runs out, the server and other equipment shuts down. Delta also had a fire.

And here is Amazon? The fact is that the vice president of this company somehow faced a similar problem. He left the data center, moving away at a decent distance. And then, one after another, they began to receive messages about UPS shutdowns. When he returned, he realized what exactly happened - the situation was similar to what happened in the Delta data center, only without a fire. Surprisingly, the automation manufacturer refused to help remove the unit from the generator and start it, despite the fact that the data center team was ready to take the risk of damage to the equipment. As a result, Amazon also suffered losses, although not as significant as Delta. In the case of Amazon, contact was made with the manufacturer of automation and custom software was created that turned on the generator in any problem cases if the situation required it.

In most cases, the generator will operate in normal mode, although a load is also possible slightly above normal. It makes no sense to save it in the event of a power outage in the data center, this is the wrong priority. When it comes to hundreds of millions of US dollars, the loss of a few hundred thousand or a million more does not play a big role. In the case of Delta, blocking the generator led to the consequences already described and the loss of not even hundreds, but one and a half hundred million US dollars.


Also popular now: