Devops

GitLab deploys on a Friday and ... is down for a few hours

Snafu blamed on config change

Fri 7 Jul 2023 // 19:45 UTC

Updated GitLab, a hosted git service not unlike Microsoft's GitHub, was down for some users as of Friday morning, Pacific Time.

Around 1634 UTC (0934 PT), the code hosting service started returning 503 Service Unavailable errors to those attempting to access the website.

Software developers who depend on the service were quick to celebrate the unexpected day off.

They also took time to cite sysadmin superstition about not deploying on a Friday. "GitLab seems to have deployed on a Friday breaking their site," quipped UK-based dev Luke Warlow. "Which is annoying cause it's stopping me deploying on a Friday and breaking my site."

The issue page for the IT breakdown itself returned an error banner when loading: "An error occurred while fetching the incident status. Please reload the page."

Nonetheless, the page loaded to explain that the cause of the downtime is presently described as a "config change."

"The service is currently being restored, we're taking multiple measures to have an immediate restore of the service, as long as a targeted fix to the root cause," the issue page explains.

"More information will be added as we investigate the issue. For customers believed to be affected by this incident, please subscribe to this issue or monitor our status page for further updates."

The impact is described as a site-wide outage and some customers, it's said, should expect their projects to be unavailable "for a period of time after service is restored."

GitLab did not immediately respond to a request for further information.

The GitLab status page appears to blame Google Cloud, noting that the affected location is "Google Compute Engine."

(The only glitch we can see on Google Cloud is some disruption around the world stemming from the Google Kubernetes Engine, but that is just a problem with "unexpected additional messages in GKE cluster logs" rather than unavailable systems. So we take GitLab's status page to mean that the downtime was caused by something within its GCE deployment.)

GitLab's status page lists the following GitLab services as disrupted: Git Operations, Container Registry, GitLab Pages, CI/CD - GitLab SaaS Shared Runners, CI/CD - GitLab SaaS Private Runners, CI/CD - Windows Shared Runners (Beta), SAML SSO - GitLab SaaS, Background Processing, and Canary.

As of 1846 UTC (1146 PT), the status page reported that the issue was still being investigated: "We have implemented a fix to mitigate Web/API services. Investigation is ongoing for other services."

At least the incident does not appear to be as severe as GitLab's 2017 loss of production data, in which an administrator deleted a directory on the wrong server during a replication process, resulting in the loss of 300 GB of live production data. ®

Updated to add

According to a postmortem report by GitLab, the outage was caused in part by a change request, "an old pipeline was triggered, applying an obsolete Terraform plan to the production environment."

While you're here... We just want to flag up that the Fedora Linux project is considering adding the collection of usage metrics – some might call it telemetry – to the distribution from release 40 on an opt-out basis. The current release is 38. The project hasn't yet worked out what metrics to collect, and says it is keen to preserve users' privacy. We're keeping an eye on it.

Topics

Special Features

Vendor Voice

Resources

Devops

GitLab deploys on a Friday and ... is down for a few hours

Snafu blamed on config change

Updated to add

More about

More about

Narrower topics

Broader topics

More about

More about

More about

Narrower topics

Broader topics

TIP US OFF

Other stories you might like

GitLab admits IT ineptitude in finance reporting is ongoing

Dump C++ and in Rust you should trust, Five Eyes agencies urge

Open source forkers stick an OpenBao in the oven

Shielding the data that drives AI

Polish train maker denies claims its software bricked rolling stock maintained by competitor

Atlassian security advisory reveals four fresh critical flaws – in mail with dead links

Duke Uni libraries decamp from 37Signals' Basecamp over CTO's blogs

Boffins fool AI chatbot into revealing harmful content – with 98 percent success rate

Proposed US surveillance regime would enlist more businesses

Raspberry Pi OS goes goth

Microsoft issues deadline for end of Windows 10 support – it's pay to play for security

Boffins devise 'universal backdoor' for image models to cause AI hallucinations

About Us

Our Websites

Your Privacy