Rackspace datacenter infrastructure took 12-hour nap in London, Sydney, Hong Kong

Borked SANs, not a security SNAFU, identified as the cause. Services are back, but Linux VMs must reboot

Simon Sharwood Tue 30 May 2023 // 09:01 UTC

Updated Rackspace is in a mess again.

The cloudy concern's status page reports outages in its SYD2, LON5, LON3, and HKG5 datacenter infrastructure across May 29 and 30.

Rackspace's first incident report is timestamped 29 May 22:24 CDT.

A subsequent update identified the issue as related to Dense Wavelength-Division Multiplexing (DWDM) in London, as that facility is related to a fiber transport network that allows Rackspace to deliver traffic between datacenters and internet service providers.

But an hour later Rackspace ruled out DWDM as a cause of the incident. The company has not updated its status page since.

The Register has obtained an email a SaaS company that resides in Rackspace has sent to its customers.

"Our hosting provider Rackspace have confirmed they are experiencing connectivity issues," the email opens. "All available engineers have been engaged and are working to resolve the issue with the highest priority."

It gets worse: Rackspace has warned customers of its London datacenters that whatever's causing the issue may disrupt their backups, and offered instructions on how to detect any failures.

At the time of writing – 02:45 CDT on May 30 – Rackspace had not updated its status page for over an hour. The Register has sought comment and will update this story if we receive useful information.

This outage comes at a terrible time for Rackspace as its US and UK customers emerge from a holiday weekend.

The company is also far from out of the woods after the December 2022 attack on its Hosted Exchange environment caused weeks of disruption and saw the service abandoned.

That incident led to protracted inability to access data, again with terrible timing as customers prepared for the festive season. Class actions are under way to give aggrieved customers a chance for compensation.

And now Rackspace customers on three continents have a new set of worries. ®

Updated at 23:00 UTC, May 30

Rackspace has identified the cause of the problem as "I/O limits in the multi-tenant Shared SAN environment had reset incorrectly."

Rackspace ran a script to reset the value and as of 12:10 CDT services were restored – with some exceptions.

"It has been identified that any impacted Linux VMs (virtual machines) will not automatically recover if storage has been adjusted and will need to be manually rebooted. Rackspace engineers can reboot impacted VMs from the portal where necessary" states Rackspace's status update.

A Rackspace spokesperson told us the incident is not considered a security matter.

Off-Prem

PaaS + IaaS

Rackspace datacenter infrastructure took 12-hour nap in London, Sydney, Hong Kong

Borked SANs, not a security SNAFU, identified as the cause. Services are back, but Linux VMs must reboot

Updated at 23:00 UTC, May 30

Rackspace runs short of Cloud Files storage in LON region

You're so worried about AWS reliability, the cloud giant now lets you simulate major outages

Stop shaming service providers for outages, argues APNIC chief scientist

Black Friday? More like Blackout Friday for HSBC's online and mobile banking

Telco CEO quits after admitting she needs to carry rivals' SIM cards to stay in touch

How much to clean up a ransomware infection? For Rackspace, about $11M

Major telco outage leaves millions of Australians disconnected

Your online store down? Can't get to your fave web shop? Maybe blame Shopify

OpenAI tackles 'major outage' hitting ChatGPT APIs

Vanishing power feeds, UPS batteries, failover fails... Cloudflare explains that two-day outage

Overheating datacenter stopped 2.5 million bank transactions

Cybersecurity snafu sends British Library back to the Dark Ages

About Us

Our Websites

You Privacy