Warakorn - Fotolia

Tip

Why streaming telemetry tops SNMP in tracking network performance

SNMP isn't going away anytime soon, but streaming telemetry is a much more effective way to generate critical alerts and measure network performance.

Knowing what's happening on the network is crucial for successful network operations. That's one key reason why network telemetry is becoming such an indispensable tool for security, troubleshooting and performance monitoring.

The problem for network operators is that legacy protocols, such as Simple Network Management Protocol (SNMP), are not robust enough to monitor large, high-performance networks. SNMP isn't dead, but streaming telemetry may be a better option.

Network telemetry refers to the collection of data from network devices, usually for alerting and for some sort of analytics. Until recently, this was accomplished with SNMP polling, syslog messages or NetFlow collection. For example, SNMP polling can alert a network operations center that a switch interface is down or a firewall's CPU utilization is over 90%. 

SNMP has been around for more than 30 years and is among the first protocols invented to monitor network devices. Its ubiquity means many popular monitoring tools use SNMP to collect and organize network information. There are, however, drawbacks to SNMP that spurred the development of new methods for network monitoring.

Why SNMP isn't the best approach

First, SNMP primarily works as a pull function. This means an SNMP collector, or network monitoring station, initiates communication with a device to pull information from object identifiers, or OIDs. The SNMP collector makes a request according to a schedule, and the network device processes that request and sends a reply. This is why SNMP collectors are also called polling stations.

In a large network in which switches, routers and firewalls are extremely busy, frequent SNMP polling can generate a lot of resources from the device being polled. When polling is too frequent, the burden on the network device can be too high to sustain.

More important, polling too frequently causes SNMP timeouts, which limits the ability of the monitoring station to survey devices. This may not be an issue for some values, such as obtaining 10-minute bandwidth averages, but because the performance of today's networks is increasing at an exponential rate, visibility gaps of tens of seconds -- let alone several minutes -- is often unacceptable.

Next, SNMP provides a timestamp only for when a device is queried and not when an event occurred on the network device. When scaled to many devices under heavy load, this produces inaccuracies in the collected data.

On the other hand, SNMP can also push information to collectors. SNMP traps can be generated by a device itself when an event occurs. The trap is sent to the network monitoring system (NMS), and in this way, SNMP can provide a sort of event-driven telemetry. 

However, SNMP traps are not reliable. An SNMP trap is sent to the NMS as a single User Datagram Protocol (UDP) packet, which is an unreliable way to convey information. A single UDP message alerting the NMS that something could be critically wrong in the network wouldn't even be acknowledged by the receiving station.

Finally, SNMP is inefficient. Though SNMP provides some granularity for what it can monitor, the data collection process is not granular at all. For example, SNMP polling can be used to check for interface status. If the polling interval is every minute and no interfaces have gone down for a while, the device will reply to the NMS every minute that everything is up and OK. This is a waste of resources -- both for the individual device and the overall network.

Preferred way to monitor network performance

Streaming telemetry solves these problems and is fast becoming the preferred method to track operations, especially in larger, higher-performance networks. Instead of polling, streaming telemetry is event-driven and initiated by the local device. This means the device is not forced to respond to requests from multiple polling stations.

Also, streaming telemetry can push network data as soon as an event occurs. Though this is somewhat possible with SNMP, streaming telemetry relies on the TCP-IP communications protocol rather than UDP. TCP provides guaranteed delivery with mechanisms to optimize data transmission. This improvement alone provides a more dependable and real-time view of the network. SNMP, by contrast, provides, in effect, a 5-minute snapshot.

Streaming telemetry is subscription-based rather than polling-based. Instead of multiple polling stations making requests on the same network device, a network device subscribes to one or many network management systems. It then creates only one data set to send to all collectors.

Because network devices can reliably communicate network events in real time, streaming telemetry paints a much more accurate view of the network. This provides a direct benefit to network operations because alerting happens immediately after an event. Network operators can remediate problems much more quickly, reducing mean time to resolution. In fact, when part of a greater continuous integration/continuous delivery workflow, streaming telemetry can trigger automated remediation.

A network device streams its information using data models to structure network information, which is generally unstructured in nature. YANG, for example, is a common data model used by devices to transmit information to a collector over NETCONF or RESTCONF. This presupposes that the network device supports these protocols, which may not be the case in many legacy networks.

Even with all the advantages streaming telemetry represents, SNMP is not dead. It's a time-tested protocol that's useful in smaller networks. In spite of its limitations, SNMP is the underlying protocol for many robust monitoring tools. However, as networks scale and increase in complexity and performance requirements, SNMP falls short.

Today, with the proliferation of on-box and off-box programmability tools, deploying this type of telemetry to overcome the limitations of SNMP is becoming the new way to know exactly what's happening -- as it occurs -- on the network.

Next Steps

AWS monitoring best practices extend beyond CloudWatch

Compare Grafana vs. Datadog for IT monitoring

Learn how New Relic works, and when to use it for IT monitoring

Dig Deeper on Network management and monitoring