Off-Prem

PaaS + IaaS

Rackspace datacenter infrastructure took 12-hour nap in London, Sydney, Hong Kong

Borked SANs, not a security SNAFU, identified as the cause. Services are back, but Linux VMs must reboot


Updated Rackspace is in a mess again.

The cloudy concern's status page reports outages in its SYD2, LON5, LON3, and HKG5 datacenter infrastructure across May 29 and 30.

Rackspace's first incident report is timestamped 29 May 22:24 CDT.

A subsequent update identified the issue as related to Dense Wavelength-Division Multiplexing (DWDM) in London, as that facility is related to a fiber transport network that allows Rackspace to deliver traffic between datacenters and internet service providers.

But an hour later Rackspace ruled out DWDM as a cause of the incident. The company has not updated its status page since.

The Register has obtained an email a SaaS company that resides in Rackspace has sent to its customers.

"Our hosting provider Rackspace have confirmed they are experiencing connectivity issues," the email opens. "All available engineers have been engaged and are working to resolve the issue with the highest priority."

It gets worse: Rackspace has warned customers of its London datacenters that whatever's causing the issue may disrupt their backups, and offered instructions on how to detect any failures.

At the time of writing – 02:45 CDT on May 30 – Rackspace had not updated its status page for over an hour. The Register has sought comment and will update this story if we receive useful information.

This outage comes at a terrible time for Rackspace as its US and UK customers emerge from a holiday weekend.

The company is also far from out of the woods after the December 2022 attack on its Hosted Exchange environment caused weeks of disruption and saw the service abandoned.

That incident led to protracted inability to access data, again with terrible timing as customers prepared for the festive season. Class actions are under way to give aggrieved customers a chance for compensation.

And now Rackspace customers on three continents have a new set of worries. ®

Updated at 23:00 UTC, May 30

Rackspace has identified the cause of the problem as "I/O limits in the multi-tenant Shared SAN environment had reset incorrectly."

Rackspace ran a script to reset the value and as of 12:10 CDT services were restored – with some exceptions.

"It has been identified that any impacted Linux VMs (virtual machines) will not automatically recover if storage has been adjusted and will need to be manually rebooted. Rackspace engineers can reboot impacted VMs from the portal where necessary" states Rackspace's status update.

A Rackspace spokesperson told us the incident is not considered a security matter.

Send us news
23 Comments

Rackspace runs short of Cloud Files storage in LON region

Rackspace? More like Lackspace as customers face upload and delete problems

You're so worried about AWS reliability, the cloud giant now lets you simulate major outages

Fake it 'til you break it, for a whole availability zone or WAN FAIL

Stop shaming service providers for outages, argues APNIC chief scientist

Tech companies should behave like the aviation industry and detail failures to improve safety for all

Black Friday? More like Blackout Friday for HSBC's online and mobile banking

Think of all the crap discounted things you won't need to buy now

Telco CEO quits after admitting she needs to carry rivals' SIM cards to stay in touch

That, plus an outage, and cyber-mess, do for Optus boss Kelly Bayer Rosmarin

How much to clean up a ransomware infection? For Rackspace, about $11M

And that's not counting the incoming lawsuits. Thank goodness for insurance, eh?

Major telco outage leaves millions of Australians disconnected

Communication minister advises businesses to “keep receipts”

Your online store down? Can't get to your fave web shop? Maybe blame Shopify

Biz races to fix broken systems

OpenAI tackles 'major outage' hitting ChatGPT APIs

Meltdown apparently resolved, capacity issues still popping up, as Claude hits a resource wall

Vanishing power feeds, UPS batteries, failover fails... Cloudflare explains that two-day outage

A little peek behind the control panel, analytics curtain

Overheating datacenter stopped 2.5 million bank transactions

Running infrastructure in the tropics has its challenges – but so do failed disaster recovery plans

Cybersecurity snafu sends British Library back to the Dark Ages

Internet, phone lines, websites, and more went down on Saturday morning