A lightning strike in Dublin this weekend caused a power failure in data centres belonging to Amazon and Microsoft, causing the companies' cloud services to go offline.
Lightening struck a transformer, sparking an explosion and fire which caused the power outage at 10:41 AM, according to preliminary information, Amazon wrote on its Service Health Dashboard. Under normal circumstances, backup generators would seamlessly kick in, but the explosion also managed to knock out some of those generators.
By 1:56 PM, power to the majority of network devices had been restored, allowing Amazon to focus on bringing EC2 instances and EBS volumes back online. But progress was slower than expected, Amazon said a couple of hours later.
"We know many of you are anxiously waiting for your instances and volumes to become available, and we want to give you more detail on why the recovery of the remaining instances and volumes is taking so long," the company wrote. "Due to the scale of the power disruption, a large number of EBS servers lost power and require manual operations before volumes can be restored... While many volumes will be restored over the next several hours, we anticipate that it will take 24-48 hours until the process is completed."
To speed up the recovery process, Amazon started adding more EBS capacity. European customers of Microsoft's Business Productivity Online Standard Suite were also affected by the power outage. But services were restored to all customers by 5:45 PM, a spokesman said.
Dutch company Layar, whose augmented reality platform has been running on Amazon's cloud for the last 18 months, was one of the affected companies.
"I've been continuously following #AWS on Twitter and looking at our AWS dashboards to see the progress. It's frustrating. There's nothing you can do except wait," said Dirk Groten, CTO at Layar.
In an interview on Monday morning local time, Groten said his company's service was up and running again and that he is still a believer in cloud services. Any data centre can experience an outage following a power failure, he said.
It was clear that Amazon was working hard to restore services, but the information the company provided didn't always match with what Layar was seeing at its end, according to Groten. Layar is now waiting for Amazon to publish a final report on the incident, which should include what it plans to do in the future to prevent something similar from happening again, Groten said.
Find your next job with techworld jobs