Observability's not-so-secret link to revenue
In a lot of companies, observability or monitoring are words that only the technical groups understand. Perhaps the legal team is involved because there are SLAs that products must not breach, or it costs the company money in penalties. This is the wrong way to look at observability though -- it’s not simply a tool that verifies your application is performing properly and lets you avoid penalties. Instead, think of observability as a magnifying lens that helps all of the teams in your company understand how to increase revenue, by understanding the complexities of your product.
There are a couple of different ways to think of proper observability: It quantifies things that weren’t measurable previously. If something is measurable, it can be improved. Secondly, it’s a way to measure an entire, complex system -- both, portions which technology teams traditionally think about ("their code"), and third-party dependencies that no one thinks about until there are customer complaints.
In fact, a lot of SLAs have clauses like "excludes events outside of the company’s control". This is great when your application is down and you don’t have to pay an SLA penalty -- but isn’t the reputation impact or the loss of revenue from users who can’t connect to the site during this time worth so much more?
In fact, instead of using observability as a crutch to understand failures, why not use it as a tool for improvement? Do you think about how much higher revenue might be if the performance of your site increased by a few milliseconds?
Gaining a more complete picture
The ultimate goal should be observability of all aspects of the system -- both portions which are under the organization’s control, and those outside of it. This is crucial as systems become more complex and elaborate.
Conventional monitoring tools detect only a snapshot of problems and why they arise. Visibility tools such as static analysis, Application Performance Monitoring (APM) and Network Performance Monitoring (NPM) don’t show how code and other components operate and interact in the real world. Because they don’t take the whole system into account, these tools often miss real-world problems.
Today, with containers, microservices and multi-cloud frameworks emerging as the new normal -- and APIs tying together as many as a couple of dozen applications -- it’s no longer possible to simply check a web app server, app client, cache or database to diagnose a problem. Web applications incorporate code from many sources, and connection points with various vendors and business partners -- for critical functions like content delivery, shopping cart functionality and web payments -- can involve many different applications.
Performance problems lead to real world consequences. Even if 19 components work perfectly but one lags or fails, the result can be slow loading web pages or a payment that simply won’t process. Today, consumers expect tasks to take place quickly and efficiently. Systems must be resilient: available, reliable, performant. For an IT team dealing with a resilience issue, it’s critical to be able to identity its origin. Is it connected to DNS resolution failure, or coding issues? Is it caused by a backbone provider issue or a bug in an API?
For complex systems, companies should use Internet Performance Monitoring (IPM) to give a holistic view of the overall system and its dependencies. When implemented correctly, IPM can give a high-level view from the user’s perspective, as well as fine-grained details about the performance of each dependency and endpoint, even ones you don’t control. For applications or networks that are under the company’s control, IPM can be easily supplemented with APM or NPM tools.
Moving beyond basic diagnostics
Just as more pixels produce a better image, more data improves the ability to diagnose problems. For example, understanding that an API is running slowly is only a starting point. Most organizations don’t have metrics that stop at the performance of an API endpoint. They have business-level metrics, like availability, or user journey performance, or revenue. Knowing that a piece of code takes 15 milliseconds to execute is useful – but only if that information can be woven together to tell the whole story of system performance and mapped back to those business-level metrics.
This approach, especially when combined with training and accountability, establishes a baseline for a best practice framework. Ultimately, developers, engineers and operations specialists have the tools they need to excel – and as a result, customers will enjoy a better digital experience, with a direct correlation to revenue.
If implemented correctly, a company’s monitoring or observability strategy weaves throughout their software development lifecycle. It also spans both the technical and non-technical teams. When it’s implemented correctly, it’s the framework that ensures that application developers and application operators are aligned. It also gives the CEO or CRO insight into the relationship between application resilience and the bottom line.
Ask your technical teams what their application performance goals are and how they’re being measured. Knowing these goals and constantly working to improve the baselines will make your teams more cohesive and your application better. Understanding your application’s impact on the customer’s experience makes your company better. And as a nice side effect, it keeps the CEO happy.
Image credit: Gunnar Pippel/Shutterstock
Sergey Katsev is VP Engineering, Catchpoint.