MobileNews

Google fully explains what caused Monday’s multi-service outage

Google started the week with a big outage that took down Gmail, Drive, and all other Workspace apps. As promised, Google now has a detailed explanation on the outage and steps it will take to prevent future incidents.

At a high level, the issue relates to existing work updating Google’s account authentication system. As the effort was ongoing, previous components were “left in place.” While keeping those older aspects resulted in an error about usage being at 0, Google instituted a grace period to delay the impact. 

That remedial fix expired and led automated systems to respond to the error as if it were real. Since usage appeared to be at 0, capacity for the identity management system was scaled down. While safety checks were in place, they were not designed to cover the specific problem.

The issue started impacting users at 3:47 a.m. PT and engineers were alerted a minute later. “Workspace apps were down for the duration of the incident” since they rely on the impacted infrastructure to make sure you’re logged in, authenticated, and authorized to see content, like emails and documents.

At 04:08 the root cause and a potential fix were identified, which led to disabling the quota enforcement in one datacenter at 04:22. This quickly improved the situation, and at 04:27 the same mitigation was applied to all datacenters, which returned error rates to normal levels by 04:33.

The company laid out plans to review, improve, and evaluate its systems to prevent similar issues of this nature. Google ended its outage explanation with an apology:

We would like to apologize for the scope of impact that this incident had on our customers and their businesses. We take any incident that affects the availability and reliability of our customers extremely seriously, particularly incidents which span multiple regions.

The full technical explanation is available here.



Author: Abner Li
Source: 9TO5Google

Related posts
AI & RoboticsNews

Nvidia and DataStax just made generative AI smarter and leaner — here’s how

AI & RoboticsNews

OpenAI opens up its most powerful model, o1, to third-party developers

AI & RoboticsNews

UAE’s Falcon 3 challenges open-source leaders amid surging demand for small AI models

DefenseNews

Army, Navy conduct key hypersonic missile test

Sign up for our Newsletter and
stay informed!