Supplement (July 2, 2018): Google Cloud Platform support staff (GCP) assured that this will not happen again. Their words: “Many people (as part of GCP) are interested in improving the situation, not only for you, but for all customers.”

Note: this post is not about the quality of cloud services google. They are excellent, on par with AWS. We are talking about "sudden movements without warning", when they completely turn off all your systems, if employees (or cars) suddenly decide: something is wrong. C it happened the second time.


Our production project uses GCP to monitor hundreds of wind power plants (WPPs) and dozens of solar power plants scattered across eight countries. We have control centers with screens all over the wall: there are dashboards stuffed with metrics that are monitored around the clock. Facility managers use this system to monitor in real time the state of individual wind turbines and solar installations. If intervention is required, it is done immediately. Development and forecasting teams use the system to refine data algorithms in BigQuery. All actions are directly translated into our profits. We are dealing with wind / solar energy - a perishable commodity. If we generate a surplus, we cannot save it and sell it later. If we generate not enough, then we pay fines.

What happened

This early morning (June 28, 2018) I received a warning from an uptime bot that the entire site went offline. A flurry of letters from Google, which say that some "suspicious activity was detected" and all my systems were turned off. ALL OFF. THE MACHINE TURNED OFF US WITHOUT WARNING. The site does not work, the application engine and databases are unavailable, several messages from Firebase say that I was lowered and therefore the limits were exceeded.

Customer support chat is off. We have no phone. An e-mail came asking you to fill out a form, upload a photo of a credit card and state ID with a photo of the cardholder. Great, let's wake up the financial director who owns the card.

We will remove the project within three working days.

“We will delete your project if the account holder does not correct the violation by filling out an account verification form within three business days. This form confirms your identity and ownership of the payment instrument. Failure to submit the requested documents may result in the final closure of the account. ”

What to do if the cardholder is on vacation and unavailable for three days? We would lose everything — years of work — millions of dollars in revenue.

I fill out the form with the details and, fortunately, within 20 minutes all services began to return to life. When this happened for the first time, the downtime lasted for several hours. In general, we lost access to all information for about an hour. An automatic letter arrives with an apology for the inconvenience . Unfortunately, the car has no idea about the number of "inconveniences."

You can not just turn off everything, and then ask for explanations

I understand that Google needs to monitor and prevent suspicious activity. But it is important what exactly you are doing after detecting suspicious activity. It requires human participation - that is not replaced by any amount of code or an AI system. You can not just turn off everything, and then ask for explanations. Need to do the opposite.

This is the first project that we have built entirely on Google Cloud. All previous worked on AWS. In our experience, AWS is much more humane in dealing with billing problems. They warn you of suspicious activity and give time to explain and understand. They don't kick you off the stairs.

I hope that the GCP team will listen and change the situation for the better. Until then, I will never post any projects on the GCP.

