The five steps to network observability
Let's begin with a math problem -- please solve for “X.” Network Observability = Monitoring + X.
The answer is “Context.” Network observability is monitoring plus context. Monitoring can tell the NetOps team that a problem exists, but observability tells you why it exists. Observability gives the Network Operations (NetOps) team real-time, actionable insights into the network’s behavior and performance. This makes NetOps more efficient, which means lower MTTR, better network performance, less downtime, and ultimately better performance for the applications and business that depend on the network. As networks get more complex and IT budgets stay the same size, observability has become very important. In the past two years, I’ve heard the term used by engineers and practitioners on the ground much more often. Gartner predicted that the market for network observability tools will grow 15 percent from 2022 to 2027.
So, how do you make a network observable? My colleagues and I see it as a five-step process. Each step builds on the previous one, and observability is the final result. This is more complicated than some definitions of observability, but we believe our version produces more benefits for the NetOps team.
Step 1: Network Discovery and Data Accuracy
This is about devices. For observability, NetOps first needs an accurate roster of all network devices, including device pairs and network clusters, as well as their configurations. Some type of auto-discovery is necessary to do this at enterprise scale. NetOps also need accurate data from logs, traces, traffic paths and SNMP, which often requires bringing together telemetry from different systems. This is the foundational layer that all the others depend on.
Step 2: Network Visualizations
This is about network topology and the connections between devices. NetOps must turn data into site maps and network documentation. This has historically been done manually by building Visio diagrams of network sites, but that takes a lot of time, and the maps go out of date quickly with modern networking technologies. Observability requires an automated way to build these maps.
Step 3: Network Design and Assurance
This is about baselines for network performance. Next, NetOps needs to understand how their network is supposed to behave normally. This includes security best practices, like which ports should be open and how backup firewalls should be configured, as well as the performance that important applications need. Building and retaining this knowledge is challenging because enterprise networks are huge, complex, change often, and have quirks built up over decades of operation. When senior engineers retire or change jobs, they often take some of this knowledge with them. NetOps must often “reverse engineer” why a particular policy is in place is no one was at the company when it was implemented.
Step 4: Automation
Network automation enables better observability in three ways. First is diagnosis automation, which automates common tests and provides a diagnosis when NetOps gets a trouble ticket. Second is change automation or automating tests before and after a network change to check for success (and unintended consequences). Last is assessment automation, which regularly checks actual network performance against the intended baselines established in step 3. This gives NetOps more context without increasing their workload. Engineers have immediate access to basic diagnostics, can see if any recent network changes may have affected it, and can check if the network is deviated from normal. This gives them much greater insight into the network.
More importantly, network automation can find problems before a user reports them. Automated network assessments can even be run continuously, with the results fed into dashboards that let NetOps monitor the network proactively. This is a major change from NetOps’ usual reactive workflow, and a huge step forward for observability. Most engineers don’t think of automation as an observability tool, but its benefits are undeniable.
Step 5: Observability
All these steps build on one another – accurate data enables reliable, complete network mapping, accurate maps make it possible to measure the design and intent of the network, and that knowledge lets NetOps build automations to assess and enforce those intents. And then those automations give more context and allow NetOps to understand their networks more deeply.
Despite what many might think, network automation is a key ingredient in observability. These five steps help NetOps with proactive issue detection, root cause analysis, performance optimization, and security compliance. Observability, how we define it, is hugely helpful in making NetOps more proactive and agile, especially when responding to new or changing network conditions and requirements. And if I’ve learned one thing in the last decade, it’s that networking is always changing.
Image source: Supphachai Salaeman/Shutterstock
Song Pang is SVP of Engineering at NetBrain Technologies.