Unplanned service interruptions lead to stress for engineers
Unplanned service interruptions which can include outages, operational overload, slowdowns in delivery, notification fatigue and other unanticipated events represent a major issue according to a new global survey of site reliability engineers (SREs).
The study by digital experience monitoring specialist Catchpoint reveals that 49 percent say they have worked on an incident in the last week, while the same percentage state they have worked on outages longer than a day in their career.
What's more these incidents are stressful, with 67 percent of SREs who feel stress after each incident not believing that their company cares about their well-being.
"While stress is part of an SRE's job, the survey shows incidents have been normalized and many organizations are not addressing the impact," says Nithyanand Mehta, VP of professional services at Catchpoint. "Combine this with the 48 percent who said their company hasn't defined service level objectives for essential services, and a question emerges: is the SRE role evolving proactively based on business needs and employee satisfaction, or is it becoming reactive and contributing to IT's high turnover rate?"
Among other findings, nearly 60 percent of respondents say their responsibilities involve excessive amounts of manual, repetitive tasks Just 38 percent say they've used automation to reduce that workload.
Also 64 percent say their role or SRE team has been in existence for three years or less, indicating that the job description is still evolving. However, LinkedIn currently lists over 2,000 US job openings for SREs, twice as many as this time last year, when Catchpoint released its first SRE survey.
"The role of the SRE is critical in an era where the digital experience is directly connected to business outcomes," says Mehdi Daoudi, CEO and co-founder of Catchpoint. "By focusing on the human element, our second SRE survey can hopefully shed light on what effect experience-impacting incidents like outages or slowdowns have on your teams and their ability to avoid or contain them. My biggest takeaway: if most SREs are spending excessive time in repetitive tasks, this does not leave enough room for the key components of a true SRE team -- capacity planning; and improving the performance, availability and resiliency of the systems, applications and services for which they are responsible."
You can find out more about the findings on the Catchpoint blog.