Data centers continue to struggle with outages
The latest Outage Analysis report released by the Uptime Institute reveals that the digital infrastructure sector is struggling to achieve a measurable reduction in outage rates and severity.
One in five organizations report experiencing a 'serious' or 'severe' outage (involving significant financial losses, reputational damage, compliance breaches and in some severe cases, loss of life) in the past three years, marking a slight upward trend in major outages.
According to Uptime's 2022 Data Center Resiliency Survey, 80 percent of data center managers and operators have experienced some type of outage in the past three years -- a marginal increase over the norm, which has fluctuated between 70 percent to 80 percent.
"Digital infrastructure operators are still struggling to meet the high standards that customers expect and service level agreements demand -- despite improving technologies and the industry's strong investment in resiliency and downtime prevention," says Andy Lawrence, founding member and executive director of Uptime Institute Intelligence.
Over 60 percent of failures result in at least $100,000 in total losses, up substantially from 39 percent in 2019. The share of outages that cost upwards of $1 million also increased from 11 percent to 15 percent over that same period.
Power-related outages are one of the most common problems, accounting for 43 percent of outages that are classified as significant. The single biggest cause of these is UPS (uninterruptible power supply) failure.
Nearly 40 percent of organizations have suffered a major outage caused by human error over the past three years. With 85 percent of these incidents stemming from staff failing to follow procedures or from flaws in the processes and procedures themselves.
"The lack of improvement in overall outage rates is partly the result of the immensity of recent investment in digital infrastructure, and all the associated complexity that operators face as they transition to hybrid, distributed architectures," adds Lawrence. "In time, both the technology and operational practices will improve, but at present, outages remain a top concern for customers, investors, and regulators. Operators will be best able to meet the challenge with rigorous staff training and operational procedures to mitigate the human error behind many of these failures."
There will be a webinar to discuss the report in more detail on June 16th at noon ET.