Off-Prem

PaaS + IaaS

Microsoft admits Azure Resource Manager failed after code change

Just lucky Western Europe was asleep when it happened


Any insomniacs, workaholics or those pulling an all-nighter related to a past deadline project may have noted a four-and-a-half hour failure of Azure Resource Manager in Europe this morning following a recent code change.

Should this have happened later in the day, admins would have pulled their hair in frustration over the impact to efficiency, but Microsoft appears to have dodged a bullet on this occasion.

Azure Status History website confirms:

"Between 02.41 UTC and 07.10 UTC on 23 Mar (sic) 2023 you may have experienced issues using Azure Resource Manager in West Europe when performing resource management operations. This would have impacted users of Azure Portal, Azure CLI, Azure PowerShell, as well as Azure services which depend upon ARM for their internal resource management operations."

What caused the multi-hour performance wobble? Readers will be pleased to know that quality control measures are still top of mind within Microsoft.

"This issue was the result of an interaction between a recent code change in West Europe which introduced a subtle performance regression, and a specific internal Azure workload isn West Europe which exercised this performance regression in a manner which resulted in significant resource saturation."

ARM, as Azure customers will know, is a deployment and management services (inc access control, locks and tags) that provides a management layer so admins can create, update and delete resources in their Azure account.

The result of the failure was "increased latency and timeouts for both internal and external ARM calls," Microsoft says.

Engineers for the cloud and software biz spotted and temporarily blocked the internal workload to "mitigate the impact to customers." Just after 0700 UTC, customer traffic in the region returned to "nominal success rate levels."

Microsoft was rolling back the code change in question in a "globally safe manner" and will "continue to monitor the situation after the rollback has been completed."

"We have identified the specific code responsible for this performance regression and are working on an alternative implementation which avoids the potential for this type of failure mode in the future," Microsoft adds.

A preliminary Post Incident Report on the root cause of the failure and repair items used will need to wait three days, and a final report "where we will share a deep-dive into the incident" is scheduled for a fortnight from today.

Today's incident won't have caused nearly as much gnashing of teeth as the one in the last week of January, when a WAN router IP address change was blamed for a worldwide Microsoft 365 outage.

Gotta love the cloud. ®

Send us news
20 Comments

Microsoft adds FPGA-powered network accelerator to Azure

'Azure Boost' vastly speeds cloudy server IOPS and is coming to all new instance types

Microsoft, Databricks double act tries to sew up the data platform market

But the one-stop shop vision fails to take it far beyond the competition

Microsoft opens sources ThreadX under MIT license

The 'Azure RTOS' used in millions of Raspberry Pis is now FOSS

AWS accuses Microsoft of clipping customers' cloud freedoms

World's biggest off-prem service slinger submits comments to UK cloud inquiry, mostly has Redmond HQ's rival in its sights

Google submits complaints about Microsoft licensing to UK competition regulator

Now Microsoft has regulator breathing down its neck in three regions

FTC wants Microsoft's relationship with OpenAI under the microscope

Hey Bing, how can I invest billions in a company but not break antitrust laws?

Microsoft partners with labor unions to shape and regulate AI

Redmond reassures AFL-CIO workers they won't be pushed out by technology

Experienced Copilot help is hard to find, warns Microsoft MVP

Almost nobody has used it, or knows it well, so beware of consultants bearing cred

Microsoft's relationship with OpenAI now in competition regulator's sights

Has recent CEO, board shenanigans given rise to a merger situation? CMA is asking for a friend

Attacks abuse Microsoft DHCP to spoof DNS records and steal secrets

Akamai says it reported the flaws to Microsoft. Redmond shrugged

Microsoft issues deadline for end of Windows 10 support – it's pay to play for security

Limited options will be available into 2028, for an undisclosed price

Microsoft pushes Azure Government Cloud as homefront defender

All your national security are belong to us!