Network reliance as the internet enters its 'third act'
Information technology professionals know how to adapt to constant change. Yet our laser focus on immediate details means we can lose sight of the big picture and miss an opportunity to stay ahead of the curve. If you read the 2020 State of the Edge report, the very first line may cure that ill:
We stand on the precipice of a profound re-architecting of the Internet…
What this report calls the "Third Act of the Internet" has begun. It includes 5G wireless, network virtualizations and new generations of the cloud and CDNs, to name just a few of the network changes ahead. The goal is simple to explain and complex in its execution: get information from the source to the user as fast as possible and with as much scalability as needed.
A third internet act "on the edge" (read: closer to the customer) will prepare it for the next waves of innovation in AI, industrial and consumer IoT, VR, autonomous vehicles and other technologies, all of which will dramatically increase network complexity and pressure network capacity.
2019’s Warning Signs
While this sounds future-oriented, there’s evidence that the internet is slowly showing its age even with current-day workloads. The glitches of the past year may offer some early warning signs.
Last June a tiny ISP in Pennsylvania caused one of the internet’s biggest traffic jams, creating a domino effect that crashed large chunks of the web including Amazon Web Services (AWS) and Cloudflare, a major CDN. This and two other outages in the June-July timeframe were network-sourced, a result of BGP leaks. The trust-based Border Gateway Protocol (BGP) has been around for 25 years -- part of the internet’s first act -- but some are now criticizing it as too vulnerable, particularly to bad actors.
While these are a small sample of headline outages, every day operations teams deal with navigating varying degrees of network-related problems making this one of the trickier roadblocks to great end-user experiences.
Network Problem Areas
The potential for network issues impacting the performance of your digital applications is complex, but I suggest these four categories warrant our attention:
- Route Health: Monitoring route health is critical with Ping and Traceroute the two main approaches. Ping measures the latency and availability between two hosts, while Traceroute tells you the physical path your data has taken. Traceroute is vital to quickly determine the source of network latency. Collecting data from every hop and visualizing it in logical diagrams will give your teams an at-a-glance view of the network impact on your services, as well as indicate when route paths deviate from known patterns.
- BGP: As shown in the examples above, BGP mishaps can cause significant damage to other networks. This protocol manages the exchange of packets between individual networks -- using a predefined set of criteria to find the most efficient route through a network. This may not always be the closest geographic route, based on localized network congestion. Any change in BGP routing -- which can be caused by human error or malicious acts -- can block or slow service availability to your users. BGP problems tend to be rare, but their impact can be vast. So it's important to keep them on your radar.
- Connectivity: With route health and BGP covered, the next step is ensuring whether data can be transferred between two endpoints. This involves monitoring TCP, MQTT and SSH protocols, each of which has its place in verifying the connection between hosts. MQTT, or message queuing telemetry transport, is used for machine-to-machine learning and will become particularly important as Internet of Things (IoT) devices grow exponentially, moving beyond nice-to-have consumer products to critical machines with industrial or medical functions.
Finally there’s DNS, which translates numeric addresses into text-based domain server names. Its solidity no longer a given, DNS has caused some major systemic outages over the last few years, including problems affecting tech giants Apple and Microsoft. Running tests against your DNS dependencies can pinpoint problems or alert you quickly when issues occur.
- Enterprise/Endpoints: Completing our network roundup, the next phase is ensuring your own local network is performing well. Placing nodes within your firewall can detect any WIFI or proxy problems that may be causing performance or reachability issues.
Setting up the proper alerts for each of these categories will give your teams early warning of problems before they impact customers or employees. With these elements covered, your organization now has true end-to-end visibility of the network delivery chain.
Remember, the internet is just a series of smaller networks working together. In the early days, we simply depended on this network of networks with minimal oversight. But its underlying infrastructure is now far more complex now, and the changes in progress in this re-architecting - over $700 billion in CapEx spending over the next decade, according to the report -- will only mean more complexity. As these pieces shift, change and grow, a new level of network telemetry is vital.
Mehdi Daoudi is the co-founder and CEO of Catchpoint, a leading digital experience intelligence company. His team has expertise in designing, building, operating, scaling and monitoring highly transactional Internet services used by thousands of companies that impact the experience of millions of users. Before Catchpoint, Mehdi spent 10+ years at DoubleClick and Google, where he was responsible for Quality of Services, buying, building, deploying, and using monitoring solutions to keep an eye on an infrastructure that delivered billions of transactions daily.