Overheating datacenter stopped 2.5 million bank transactions

Running infrastructure in the tropics has its challenges – but so do failed disaster recovery plans

Outages at two banks that stopped 2.5 million payment transactions were sparked by a technical issue with the datacenter's cooling system, according to the Monetary Authority of Singapore (MAS) on Monday.

DBS and Citibank, the banks involved, experienced outages in the mid-afternoon of October 14, 2023 that resulted in full or partial unavailability of online banking apps for around two days – leaving customers and vendors without a way to make payments in a city-state that is increasingly reliant on digital financial systems.

In fact, according to minister Alvin Tan in a parliamentary reply, the outages led to 810,000 failed attempts to access the two platforms while 2.5 million payment and ATM transactions could not be completed.

The root cause of the outages was issues in the cooling system that caused the temperature to rise above optimal operating range at the Equinix datacenter used by both institutions.

Equinix has reportedly blamed a contractor, alleging that person "incorrectly sent a signal to close the valves from the chilled water buffer tanks" during a planned system upgrade.

Upon the outage, both banks immediately activated IT disaster recovery and business continuity plans.

"However," according to Tan, "both banks encountered technical issues which prevented them from fully recovering their affected systems at their respective backup datacenters – DBS due to a network misconfiguration and Citibank due to connectivity issues."

Tan concluded that the two banks had "fallen short" of MAS requirements to ensure critical IT systems are resilient. The authority requires that unscheduled downtime for critical systems affecting bank operations should not exceed four hours within any 12 month period – a limit easily exceeded in this case.

As a result, the MAS has slapped DBS with some hefty punishments – including barring it over the next six months from reducing the size of its branch and ATM network, making any non-essential IT changes, or acquiring new business ventures.

DBS is also required to apply a multiplier of 1.8 times to its risk-weighted assets for operational risk.

Singapore-based Acronis CISO Kevin Reed told The Register it was surprising the cooling system was not redundant, as were the banks failed backup plans.

"As is often the case, an incident is not a single failure, but a chain of interconnected events. Usually, a properly implemented failover takes seconds or minutes," he lamented to The Reg on Tuesday.

MAS does not oversee external providers to banks, such as Equinix.

Overheating of a datacenter is an unfortunate event in a country that literally wrote the standard for datacenter operations in the tropics.

Such standards – as well as successful implementation of them – will only become more important as extreme weather takes hold globally.

In addition to the need to keep datacenters cool and backup systems resilient, it's vital to recognize the challenges inherent in full reliance on digital systems, according to Tan.

"While we want to be digital first (in) our approach to digitalization, we cannot be digital only," he declared in parliament on Monday.

He advocated for consumers and businesses to be aware that sometimes these systems do not operate smoothly, and to provide other forms of payment for the continuity of economic participation. ®

More about

TIP US OFF

Send us news


Other stories you might like