Artificial intelligence for IT operations pays off -- with time

IT professionals say AIOps holds great potential to optimize day-to-day operations tasks. First, though, teams must lay a foundation of data, consistency and trust.

On the surface, AIOps might seem like the latest IT buzzword, but for enterprises that have adopted artificial intelligence for IT operations, the benefits and challenges of that move are quite real.

Overall, AIOps -- the use of data analytics and machine learning to streamline common IT operations tasks, from incident response to root cause analysis -- is in its early days, but it's where enterprises are headed.

"[AIOps] is the only way that we're going to be able to achieve another sort of gain in efficiency on managing and deploying workloads and operations," said Carl Brooks, analyst at 451 Research.

In fact, for some companies, AIOps feels like the next, logical step after embracing DevOps and other IT practices that heavily emphasize automation.

"AIOps, to us, is kind of a continuation of our transformation into agile, DevOps and continuous delivery -- it was enabled by all of those things to allow us to have an infrastructure that's automated," said David Wilson, senior director of infrastructure and architecture at Paychex, an HR software provider based in Rochester, N.Y.

Just enough provisioning

Organizations that implement AI for IT operations can realize optimized resource allocation and capacity management.

That's the case at Paychex, which uses AIOps to proactively and automatically route customers to a set of optimized infrastructure resources, based on the size of their workloads. The company wanted to tailor its services to user demand across its diverse customer base, which ranges from small organizations that use Paychex HR and payroll software for just a handful of employees to large, multisite enterprises that employ more than a thousand workers.

[Our infrastructure] looks homogenous, and we manage it in a homogenous way, but it's aware of who the customer is and responds that way when they're in the system doing work.
David WilsonSenior director of infrastructure and architecture, Paychex

Before the company implemented intelligent, real-time traffic routing via AIOps, it overprovisioned its IT infrastructure, which spans an on-premises environment and the Microsoft Azure public cloud. Even then, the IT team ran the risk of users competing for resources.

Paychex uses Cisco's AppDynamics application performance management platform to gain insight into the type and size of transactions its customers perform. Custom-built machine learning models consume the data and work alongside customized automation and an F5 load balancer to dynamically route customers to the appropriate set of compute and resources.

"[Our infrastructure] looks homogenous, and we manage it in a homogenous way, but it's aware of who the customer is and responds that way when they're in the system doing work," Wilson said.

Detective work

Another common use for AIOps is automated event correlation and root cause analysis, which the engineering team at Barracuda Networks, a network security company based in Campbell, Calif., tapped into when it modernized deployments.

Barracuda migrated to AWS and adopted microservices and containers. With this newfound agility and the ability to modify infrastructure on the fly, the IT team oversees a much more dynamic environment with a complex set of dependencies.

AIOps stages
AIOps processes consist of four primary stages.

"It's just a little bit more noisy from a monitoring perspective," said Lior Gavish, senior vice president of engineering for Barracuda's email protection services. "You might get seven different alerts on the same thing, because it's a chain reaction."

To alleviate alert fatigue and more quickly pinpoint root causes, Barracuda adopted SignifAI's AIOps tool to correlate events and automate root cause analysis to generate fewer, and prioritized, alerts. SignifAI was acquired by New Relic in 2019.

The tool has simplified how Barracuda's team sets up an IT monitoring environment as well, Gavish said. It can integrate with more than 60 other monitoring systems, open source and commercial, without manual integration and alert configuration from the IT team.

"[SignifAI] essentially hooks into all these systems and pulls the data proactively," Gavish said. "We don't have to set up dashboards for each particular metric and each particular source of information. It automatically tracks anything that we add into our infrastructure."

The ability to more quickly make sense of monitoring data and correlate events was a key driver for Experian, a credit reporting firm, to adopt AI-enabled tools for IT operations. The company uses Dynatrace to monitor its hybrid cloud infrastructure, detect dependencies and pinpoint the base of issues. Dynatrace's monitoring data feeds into ServiceNow, where it's raised up to an incident level for an operator to act upon.

"It takes people out of the world of really trying to look for where the problem is and work out who the culprit is to knowing the culprit, making sense of the data and being able to take action immediately," said Jonathan Hayes, vice president of global IT service excellence at Experian, based in Costa Mesa, Calif.

Why AI doesn't come easy

To realize the efficiency gains of AIOps, organizations typically have to overcome both technical and organizational challenges.

Any automated IT processes, including those that underpin AIOps, rely on consistency. Enterprises must standardize incident response workflows so that an AIOps tool can identify patterns and learn the appropriate way to address common issues.

Alongside automation, data is at the core of any AIOps strategy. Effective data collection and management are a must before the implementation gets off the ground. Any data fed into an AIOps system, whether from log files or help desk tickets, should be clean, categorized and complete.

Some IT team members are skeptical of AIOps tools taking complex processes out of human hands. This is true even of admins who are well accustomed to automation scripts and tools, such as Chef and Puppet, since these approaches are still under the thumb of the implementer, 451's Brooks said.

"The inherent conservatism of most IT operators makes them look upon stuff like [AIOps] with incredible suspicion," he said.

Building trust in AI for IT operations definitely takes time, Barracuda's Gavish agreed. In the early stages of an AIOps project, a tool might provide false negatives or positives, as it adjusts to the environment -- but don't it.

"I think that's the main challenge -- just to make sure that people trust [an AIOps system] and that the type of alerts that it generates are actionable," he said.

Lastly, an AIOps strategy demands operations admins broaden their purview of both IT and business initiatives, as they offload repetitive break/fix tasks and take on strategic projects.

"Your worldview and understanding of a complex environment have to be much bigger than 'follow the knowledge article, do these 10 steps and close the incident,'" Paychex's Wilson said. Nevertheless, he added, "That's a good problem to have."

Dig Deeper on IT systems management and monitoring