Getty Images

3 best practices for network observability

Research shows a correlation with NetOps success and network observability. Following best practices of network observability can facilitate this success.

Network observability is a vital method of developing network intelligence many network teams aren't using.

Observing a network could potentially boost the success of an enterprise's network operations (NetOps) by a large margin. Teams can take a few steps to implement network observability, and doing so will enable network administrators to better understand their networks and guarantee adequate service for their end users.

What is network observability?

When teams monitor a network, they focus on the performance of the network. If a problem arises in a network, monitoring enables network admins to detect the issue. While teams are able to troubleshoot network issues with management and monitoring, network observability can provide a more thorough assessment of the network. When teams observe a network, they aim to understand how a problem occurred, how to correct the issue and how to improve the network to prevent future errors from occurring.

Network observability can also be defined as "solving the problem of reconstructing the end-user experience state variables from the measured ones in the minimum possible length of time," said Göran Edin, CTO of Data Ductus, a software engineering consulting company, in a recent webinar.

Edin, whose definition is an amendment of Rudolf Kalman's definition of the observability of control systems, outlined the following principles for enterprises to make their network services observable:

  1. measure end-user experience;
  2. use telemetry methods to glean data; and
  3. provide service assurance to ensure quality services for customers.
Three steps for network observability
Enterprises can follow these three fundamental practices to make their networks observable.

1. Focus on end-user experience

Research indicates the positive effect of measuring end-user experience. According to a study conducted by Enterprise Management Associates on network management megatrends in 2020, one-third of IT problems are reported by end users before they are detected by the NetOps team. Enterprises that measure and monitor end-user experience reported more successful operations.

While these statistics emphasize the significance of network monitoring, observing end-user experience could provide more valuable information about how to improve the network. Monitoring the network only enables teams to collect information about the network, and that is "not enough," Edin said.

Network pros should observe the network to gain insight and create a data-driven system to make decisions best suited for the development of the network. With more applications moving to the cloud or evolving into complex distributed systems, investing in observability systems based on end-user experience could simplify NetOps management. Ideally, this system should be able to predict potential problems, simulate scenarios and recommend network improvements, Edin said.

2. Use telemetry for NetOps

Network pros will need to collect sufficient data to create a system that will make their network services observable. They must use the most relevant methods of telemetry to collect data to monitor and observe network services. Several modes of telemetry exist, but the most relevant types regarding network monitoring are data configuration, synthetic data and device telemetry.

  • Data configuration is when network admins choose data that represents the operating intent. Discovering the operating intent is a step toward intent-based networking, and it lets network pros learn how their networks behave. Edin said that, in his experience, it's difficult for network pros to monitor end-user services without knowing the operating intent.
  • Synthetic data enables teams to test using synthetic traffic to simulate the end-user experience, which is the closest they can get to emulating the end-user experience, Edin said. Mimicking user interaction lets admins assess how users engage with the network.
  • Device telemetry is when admins use metrics to examine the state of the network. This form of telemetry, when used along with synthetic data, is a valuable data collection tool for teams, according to Edin, as they can determine the root cause of an issue.

While these approaches are useful to collect data, they primarily assist with monitoring a network. They become more relevant when teams want to provide service assurance, because the data can be used to determine if a network is functional and if its services are working properly.

To collect quality data that can be used for network observability, network teams must ensure their collected data is relevant, coherent, accessible, consistent and well-defined. With high-quality data, they can recognize what works in their networks, what needs improvement and how to apply any modifications.

3. Ensure service assurance

Network observability is part of the service assurance process, Edin said. Once the observability platform or system is built using the telemetry methods of monitoring a network, teams should also be prepared to have a "data preprocessing layer" that can "clean" the data collected from the telemetry methods, he added. This cleaning process ensures the data is high-quality so it can be useful for the observability platform.

Network teams with the software capability can create their own data preprocessing layers or other service assurance systems. There is also opportunity for them to use 5G to virtualize the infrastructure and run test agents to verify if a network's high-performance services are working. Nevertheless, the observability platform must ultimately generate relevant data for teams to make sense of their networks and yield service assurance for customers.

Service assurance should also be part of the overall service lifecycle, Edin said.

"Doing so will not only remove the risk of introducing errors through manual processing, it will also improve the speed of delivery from weeks or months down to at least days," he said, adding that speeding up the process will also reduce labor costs.

Integrate observability with DevOps

Network teams can also incorporate service assurance into a DevOps process by following these same steps outlined by Edin. First, they should measure end-user experiences. They can then identify questions about their networks for which they need answers. How simply these questions can be answered can also help determine how observable a network is.

Network pros should use the best modes of telemetry to gain insight on their network services and create their systems. Edin said he recommends teams begin with data configuration to determine the operating intent.

"Make sure that you have that source of truth in there, showing and telling you what services you have out there," he said.

He next recommended teams follow with device and synthetics telemetry to coherently interpret the end-user experience and examine the success of the system resources. Teams can add other methods of telemetry if needed.

Finally, service assurance should be integrated into network automation. This entire process should be executed, reviewed and repeated as many times as necessary.

As NetOps become more automated and new services are developed, there is the risk of teams changing the behavior of their networks and, consequently, changing the experiences of the end user. Ensuring service assurance, as well as the other steps of the service lifecycle, with network observability reduces that risk, Edin said.

Next Steps

Automation brings NetOps to the next level

Dig Deeper on Network management and monitoring