Machine learning network monitoring shows AIOps' promise

Machine learning network monitoring tools highlight the promise of artificial intelligence for IT operations, helping networking pros contextualize data and turn it into action.

Stephen Hawking famously warned that artificial intelligence could pose the single greatest threat to human existence, yet AI proponents predict it will help improve workplace productivity, address environmental crises and cure life-threatening diseases. But, for now, the debate remains primarily academic, as AI lacks the necessary maturity to ferry mankind to heaven or drag it to hell.

AI for IT operations (AIOps) boasts similarly vast, but as yet mostly unrealized, potential. Many experts anticipate artificial intelligence will eventually revolutionize the enterprise network -- automating complex processes, making sophisticated decisions and requiring minimal human intervention. But we aren't there yet.

"The promise of AIOps still far exceeds reality," said Gartner analyst Sanjit Ganguli. He added that, ultimately, the industry could see unified architectures that create automated, continuous closed loops between monitoring, automation and service management systems, based on information streams from across the enterprise -- network, application, infrastructure, customer sentiment and environment.

"Are vendors moving toward this perfect world of self-driving environments? Yes," he said. "Are we close? No."

But the industry has seen early progress with machine learning network monitoring and automation tools, he added. Machine learning -- a subset of AI -- uses mathematical algorithms to make decisions based on real-time data streams, gradually becoming smarter by analyzing the statistical probability of a particular outcome based on past experience. Machine-learning-driven monitoring technology from vendors like Hewlett Packard Enterprise's Aruba, Juniper's Mist, Moogsoft, Nyansa and Splunk allow network pros to proactively improve performance instead of waiting for problems to arise -- a step Gartner analysts cited as a critical one on the road to next-generation NetOps, or NetOps 2.0.

"The goal of AIOps, Intent-based networking and all these newer technologies is that the network operates more efficiently," Ganguli said. "This ultimately means network engineers can focus more on the cool stuff -- optimizing their environments, rather than always fighting fires. So, it's an exciting time for the industry."

From raw data to actionable insights with machine learning

Virginia Tech, in Blacksburg, Va., began using Aruba's machine learning network analytics platform in the summer of 2017 as a beta customer. The university's IT team quickly found the cloud-based tool gave them an unprecedented look into their massive wireless environment, which includes 200 buildings and around 38,000 faculty, staff and students, with additional satellite campuses around the state. More than 70,000 unique devices connect to university Wi-Fi daily.

"Before NetInsight, we were really reactive," said Steven Lee, program director for network engineering and operations. He explained the university's legacy analytics tools required his team to manually comb through data metrics to spot anomalies in an effort to get ahead of any percolating problems. Otherwise, they relied on end users to report connectivity issues -- a step he said the student population often doesn't bother to take.

"They'll just ignore it and hope it fixes itself," Lee said. "We really didn't have a good feel for what the end user was experiencing. We lacked assurance that the design we put forward was the best one for a given location."

Ganguli said machine learning network monitoring tools like NetInsight can help IT pros solve problems they didn't even know they had by helping make sense of the massive amounts of data their IT operations systems generate.

At Virginia Tech, Lee said NetInsight's algorithms help his team turn raw metrics into actionable information.

"It's provided additional data and also helped contextualize it," he said. "We now have a better understanding of our shortcomings and can address those."

During a network upgrade in 2015, for example, his team decided to experiment with the wireless optimization settings in a particular residence hall, but failed to restore the standard configuration once they finished testing. The underperforming settings remained in place for two years, until NetInsight flagged the anomaly and suggested a possible fix.

"Our engineers looked at it and said, 'Oh, yeah. We made that change because we wanted to see what the effect was, and then we forgot about it,'" Lee said.

In addition to mitigating inevitable human error, the machine learning network analytics tool also informs Virginia Tech's future wireless strategies, helping the IT team put their philosophy of continuous improvement in action.

"Now, instead of guessing what our users want or need, we can see, 'In this building, the number of users has increased tenfold in the last year.' We actually have the data to support the need for an upgrade or a redesign."

After an upgrade, the platform can also quantify the return on investment by comparing new performance metrics against historical benchmarks, helping Lee's team both prove a project's value and justify similar initiatives elsewhere in the network.

Next-level event correlation and suppression with AIOps

KeyBank, a regional bank headquartered in Cleveland, recently deployed Moogsoft's AIOps platform in an effort to minimize reliance on manual IT tasks and reduce the network operations center (NOC) staff's alert fatigue. Moogsoft's machine learning algorithms can process multiple data streams, automatically suppressing innocuous events, while grouping related alerts and presenting them in actionable narratives it calls Situations.

Mick Miller, senior DevOps architect at KeyBank, estimated Moogsoft has already slashed KeyBank's operational event noise by 98%.

Now, instead of guessing what our users want or need, we can see, 'In this building, the number of users has increased tenfold in the last year.' We actually have the data to support the need for an upgrade or a redesign.
Steven LeeProgram director for network engineering operations, Virginia Tech

"That translates to a significant reduction of alerts our NOC team needs to process daily," he said, adding that, without the distraction of constant false alarms and unnecessary operational updates, staff is now free to resolve salient issues much faster.

The KeyBank team also pays close attention to a feature Moogsoft calls Similar Situations, which correlates current problems with historical ones, highlighting any past takeaways or fixes for consideration.  

"Our NOC staff can look at prior situations and see how they were resolved," Miller said, adding that the quick-reference feature leads to faster resolutions. "That allows our critical incident teams to devote more attention to prevention."

Ganguli said he expects machine learning network monitoring tools will increasingly move toward what he calls "true AI," with natural language processing and automated, closed-loop changes -- putting technology firmly in the driver's seat. Rather than simply flagging a problem and suggesting a correction, machine-learning-driven software would make independent adjustments to the network, without any human intervention.

"The network is dynamic, so I think automatic, on-the-fly changes could have value," Lee said. "But, honestly, I don't think I'd be comfortable allowing that to happen yet."

Before handing AI the keys to the network, "we need to police the technology and gain confidence in how it works," he added.  

In the meantime, Ganguli said network managers should consider how next-generation machine learning network monitoring and automation tools figure into their own long-term strategic planning, adding that the majority of management offerings on the market are still data-centric, rather than insight- or action-centric.

"They're good at reporting to you what's going on, but less good at telling you why," he said.

Ganguli suggested networking pros engage with their existing vendors to find out if and how they plan to incorporate AI and machine learning into their products and services.  

"We absolutely encourage enterprises to have a requirement that any offering they buy has either, A, existing functionality or, B, a roadmap to move in the right direction."

Next Steps

See where AIOps is heading in 2020

What is the role of machine learning in networking?

Dig Deeper on Network management and monitoring