Why automation can help continuously validate security policies [Q&A]
Security professionals all know that they should test their security hardware and software periodically to make sure it's working as intended. Many normal IT activities have unintended consequences that cause security configurations to 'drift' over time and make the organization more vulnerable.
But testing is frequently postponed or ignored because it never becomes a high enough priority. We spoke to Song Pang, SVP of engineering at NetBrain, to find out how automation can be used to detect when security products or network traffic are no longer behaving as intended.
BN: Why are network outages and service degradations becoming more common despite more investment in networking technology?
SP: There are several reasons for this. Public and private clouds, virtualized components, microservices, software-defined LAN and WAN capacity, and more have all made network operations' tasks much more complicated and more labor intensive. A recent report by SolarWinds found that hybrid IT is increasing network complexity; 49 percent of tech pro respondents said that the acceleration of hybrid IT has increased the complexity of their organization's IT management and 36 percent admitted they were only somewhat confident in their ability to manage that complexity.
Second is attrition and retention costs being seen across disciplines, including the NetOps staff, especially those with specialized experience and knowledge. Covid-19 has exacerbated an existing shortage of specialized IT talent and the current economic outlook means many IT teams have constrained budgets to hire and retain staff even if they can find them. The learning curve for new engineers is steep without the ability to harness the institutional knowledge of the network.
The third and final factor is the outdated NetOps processes themselves. Overall, NetOps processes and approaches are largely manual and script-dependent and have stagnated for decades. These are almost always inconsistent, non-repeatable and focus on individual device health rather than IT service delivery outcomes. This creates inefficiencies in the NetOps team, and means that knowledge from previous troubleshooting usually cannot be applied when the same problems reoccur elsewhere or at another time. This all adds up to lots of wasted time on repetitive and common tasks, an inability to both find transient issues and to diagnose whether issues are due to the network, long post-incident war room discussions, and ongoing network outages.
BN: What's the current state of NetOps teams at major enterprises?
SP: NetOps teams at most enterprises have been stretched to breaking point and getting through each day without a major catastrophe is a common goal. The increasing complexity of enterprise networks means the number of trouble tickets has skyrocketed to the point that keeping up with those tickets in a timely manner is impossible. A large multinational customer regularly reported more than 12,000 network remedial service tickets each month, each requiring on average four hours to resolve. It sometimes took days for engineers to begin working on a ticket after the issue was reported.
Many NetOps teams are also stuck in the past, using badly outdated processes focused on manually maintaining device health using barrages of command-line sequences or creating fragile ad-hoc scripts. This approach is almost entirely reactive, with operational leaders still focused on finding, escalating, diagnosing and trolling through large amounts of code to fix problems when they occur, rather than fixing issues in a smarter way, or even preventing problems from occurring in the first place. This is ironic since it costs dramatically less to prevent a disruption before it hits production services, than it does to restore operations after a problem manifests into production and affects the business.
BN: How is the current IT skills shortage and IT budget scrutiny affecting NetOps?
SP: For the past several years, many of the industry's top IT analyst firms have been citing a dwindling pool of experienced resources after COVID. The Uptime report also found that, "Problems with attracting and retaining staff appear to be worsening… over half (53 percent) of operators surveyed report difficulty finding qualified candidates for open jobs -- up from 47 percent in 2021, and 38 percent in 2018. Operators also face difficulties with employee retention -- 42 percent report staff being hired away, which is more than double the 2018 figure of 17 percent." Shrinking budgets will further limit IT departments' ability to employ experienced engineers and will make hiring and retention even harder and more costly.
Organizations are already struggling to find qualified people to manage their hybrid cloud-connected digital infrastructures. Fewer highly-skilled resources on-staff means network service tickets take longer to resolve and require more escalations. If only one engineer on a team has the knowledge to fix a particular problem or update a particular piece of hardware, what happens when they're out sick or on vacation or they live 1,000 miles from where the issue is reported? Or worse, if they leave the company without having documented their network knowledge. If someone else gets a trouble ticket, how long will they have to wait until the right subject matter expert is available to help them?
Despite the increasing network complexity and rising frequency and cost of network outages, the skills shortage and budget scrutiny has made 'do more with less' the motto of many NetOps teams for the near future.
But the most informed IT leaders know that there are smarter ways to accomplish the same result. The most direct is the adoption of no-code network automation, which captures the expertise for an organization’s subject matter experts, replicates that knowledge to apply to hundreds or thousands of similar scenarios, and then can be execute on-demand, in response to external events, or even proactively to continuously verify various conditions before they cause significant problems.
BN: How can NetOps processes be updated to become more strategic and efficient for modern networks?
SP: The good news is that today's most aggressive CIOs and IT leaders are looking for change to meet their service delivery commitments. They are looking for the means to ensure their network-connected business is focused on delivering service outcomes and specific KPIs are being created that are based on response times, service costs, unplanned downtime, severity, populus impacts, etc.
They are looking for no-code network automation. They are looking for the means to enforce the network design, application service needs, and manage costs. By starting with a fresh view of the NetOps role, and the traditional inefficiencies found in their current processes, NetOps can start proactively managing to the desired business outcomes (for example, keeping VoIP calls clear, or maintaining security policy compliance, or keeping the application responsiveness needed in e-commerce). The traditional reactive, device-level code-based method of network operations just can’t scale to match the complexity of modern networks, especially in light of constrained talent. No-code network automation approach will easily increase operational efficiency and reduce Mean Time to Recovery, service outages, and unplanned downtime without adding cost or overhead.
BN: What role should no-code network automation play in this process? Which tasks can be automated (and which ones shouldn't be)?
SP: No-code network automation can make modern NetOps dramatically more effective when compared to the widely used and decades-old manual device-centric approaches. Applying no-code network automation to modern network operations enables:
- Prevention of service failures long before they impact production. Automation can verify and validate long lists of operational parameters continuously, comparing the real-time network performance to known expected behaviors, detecting problems in the making before they affect the bottom line. This can prevent configuration drift, failover solutions failing, performance degradation and security attacks.
- Automation of preliminary problem diagnostics for every service ticket to make the network engineer’s time more productive and speed up root cause determination. This can reduce service disruption duration and remedial costs by 75 percent or more.
- Capturing and execution of SME remedial best practices allowing their knowledge to scale across geographies and time. Creating a repository of all subject matter expert's knowledge and making it available to be executed by the entire team enables engineers to solve issues that they may not specialize in and would otherwise result in escalations.
- Accurate network visualization, modeling, and mapping in real-time. Every aspect of IT service delivery management relies on accurate understanding of the real-time digital infrastructure. And the structure has context including the devices themselves, their connectivity, the flow of information, and the expected behaviors. Two-dimensional or simple connectivity documentation increases risk and costs time and money, so understanding the context of the hybrid network is fundamental to smarter operations.
BN: What are the benefits of this no-code approach to network operations and do they help solve the issues discussed earlier?
SP: No-code network automation will make modern NetOps dramatically more effective when compared to the widely used and decades-old manual device-health-centric approaches. No-code automation revolutionizes NetOps with a strategic and forward-looking approach, far beyond the capabilities and scale of existing processes in place today. With the strategic goal to directly support business initiatives, while at the same time reducing risk and the overhead costs, no-code network automation becomes transformational. One of our customers recently estimated that no-code network automation reduced their operational servicing costs of their network by more than half, and even more astounding, reduced the duration of service degradation incidents by more than 75 percent.
BN: How should Enterprise NetOps extend their efforts to the public cloud?
SP: Modern hybrid networks typically include almost half of their workloads based in the cloud. According to analyst firm Gartner, almost $591 Billion USD will be spent on cloud services in 2023. So it is imperative that any forward looking management strategy include the cloud as a standard platform. No-code network automation treats all components, physical or virtual, edge to cloud equally and enables management tasks to span the entirety of the structure. Only through this multi-vendor edge to cloud approach can user experience, and service delivery be managed effectively.
Traditional NetOps should extend their management domains and operational plans to include the cloud. And since the public cloud is such an integral part of many companies' IT infrastructure, IT leaders should seek the same level of visibility and control, along with the ability to automate performance, design compliance and security enforcement regardless of platform.
Image credit: videoflow/depositphotos.com