Why it's time to guarantee resilience in our critical software
![](/wp-content/themes/betanews/images/authors/staff_smallthumb.png)
![Cyber resilience](https://betanews.com/wp-content/uploads/2022/11/Cyber-resilience-640x449.jpg)
Software has become central to our daily lives, with nearly every major company relying on it to operate. We are all increasingly dependent on fault free software for almost everything we do -- whether it’s ensuring trains run on time, accessing websites or using online banking.
Software has evolved into a form of digital public infrastructure, just as vital as physical infrastructure like roads and utilities. Yet, despite its critical role, software largely goes unmonitored and unregulated.
As software-related outages become more frequent, the risks are becoming impossible to ignore. Recent Harness research found that almost half (44 percent) of people have been affected by an IT incident. Over a quarter (26 percent) were impacted by the recent CrowdStrike outage in July 2024 that impacted airlines, banks, hospitals and other essential businesses.
These events are occurring with alarming regularity, prompting widespread concern and calls for action. To protect consumers, urgent changes are needed to ensure the apps and systems we rely on don’t fail us.
The real impact of software outages
The CrowdStrike incident was not an isolated event; software outages have dominated headlines over the years from faulty configuration changes at Meta to Sonos releasing an app with a number of features missing.
What’s often overlooked is the real-world impact these outages have on ordinary people. According to recent findings, 66 percent of people believe that releasing “bad” software code leading to outages is just as bad as -- or even worse than -- supermarkets selling contaminated food. Additionally, 52 percent think software companies responsible for such failures should face consequences, including compensating affected businesses, government fines, or even temporary suspension from trading.
These events also erode consumer trust. For example, 41 percent of consumers are less trusting of companies that experience IT outages, and over a third (34 percent) have changed their behavior because of them. This includes ensuring they have cash on hand (19 percent), keeping more physical documents (15 percent), and diversifying service providers, such as using multiple banks (11 percent), to minimize the impact of potential disruptions.
What’s most concerning is that these outages are not caused by malicious actors or cybersecurity breaches but by preventable mistakes in the software development process -- errors and lapses in quality control that could have been avoided. These failures are causing significant disruption to consumers’ lives, and people are growing increasingly frustrated. The message is clear: companies must prioritize quality control to rebuild trust and minimize the real-life impact on their customers.
Calling for regulations
Consumers are crying out for change; 74 percent of them think there should be regulation to ensure that businesses are held accountable for delivering software updates that lead to outages.
Just as they do for the banking and healthcare industries, or in cybersecurity, regulators should consider mandating minimum standards for the quality and resilience of the software that has become embedded in our daily lives. New regulations coming into force in 2025 such as DORA are an indication that the wheels are already in motion. DORA is aimed at enhancing the digital operational resilience of financial services by enforcing regular testing of critical systems to ensure software can handle disruptions. Regulators will likely go further and begin to extend these rules out to encompass more organizations and sectors, enforcing stricter standards.
If they want to get ahead of such measures, software providers will need to implement modern software delivery practices that enable them to continuously improve code quality and drive more stable release cycles. For example, simple steps like feature flags and canary deployments could have drastically reduced the impact of the problems CrowdStrike encountered, by ensuring the update only went to a few devices to begin with. This would have helped its engineers to identify potential issues early and mitigate them before they snowballed into a global IT meltdown.
These practices will increase software resilience, improving trust and brand loyalty, while getting ahead of any regulations that are coming. This will allow the software industry to get on the front foot and relegate major global IT outages to the past.
Reliability and Resilience
The CrowdStrike outage was a stark reminder of the widespread impact software failures can have on businesses across all sectors around the world. As we move toward an increasingly digital and more highly regulated world, the importance of software reliability can’t be overstated.
While engineers rigorously test their code, the sheer volume of releases means that even the most experienced developers cannot catch every bug. As we’ve seen, a single piece of flawed code can disrupt the global economy on a massive scale, highlighting the critical need for robust development practices.
Regulations will play a key role in safeguarding digital public infrastructure, but organizations must also adopt effective software delivery practices to ensure they are getting out on the front foot and doing everything in their power to protect their customers, their bottom line, and their reputation. Organizations with the most effective practices will be better positioned to innovate rapidly and secure a stronger competitive edge in their market.
Image Credit: putilich/depositphotos.com
Martin Reynolds is Field CTO at Harness.