Nabugu - stock.adobe.com

Dynatrace users make headway with AIOps

Dynatrace's aggressive NoOps vision hasn't come to fruition, but some customers have begun to see real-world results with root-cause analysis and automated rollbacks.

LAS VEGAS -- Customer IT operations don't look the way Dynatrace executives thought they would four years ago, but their resemblance to that long-imagined world of self-healing IT systems has finally started to grow.

Dynatrace was outspoken under its previous CEO in 2019, even amid industry skepticism, about a future of NoOps, in which unsupervised AI runs the entire IT infrastructure without human intervention. Company officials said at the time that Dynatrace already had a NoOps environment internally and began to offer customer training programs on NoOps concepts.

Four years later, as the company's customer base reconvened in person for the first time since 2019 due to COVID-19, that messaging had changed somewhat. While AIOps is still very much in play for sophisticated, data-driven IT automation, its most extreme form in NoOps wasn't mentioned.

In CEO Rick McConnell's keynote presentation at the company's Perform conference this week, he described an ideal environment as one in which automation helps human operators, rather than one in which they are not present.

"We are not focused on enabling an army of people sitting in a network operations center looking at a sea of alerts trying to figure out the needle from the haystack," he said. "When something does go wrong, we want to provide precise analytics and instrumentation as to where it happened, so you're not searching for that."

So far, adoption of such sophisticated automation and incident prevention has been slow. Generally, AIOps auto-remediation is still rare in production enterprise environments, according to industry analysts.

"We're at the very early stages of that kind of self-driving autonomous operation, the idea of avoiding an incident as opposed to using an event or an incident as a catalyst to take action," said Gregg Siegfried, an analyst at Gartner. "There are not many, if any, folks that are there yet … among Gartner clients. They're not going to jump into the self-driving [concept] before it's pretty well established."

However, the fact that AIOps remains a work in progress doesn't mean it's been stagnant -- and for some customers, the progress it has made has been significant.

Dynatrace CEO on stage during Perform 2023 keynote
Dynatrace CEO Rick McConnell gives a keynote presentation at this week's Perform conference.

Vivint Smart Home auto-remediates critical apps

When Michael Cabrera was hired as director of site reliability engineering (SRE) in 2019 at Vivint Smart Home, a home automation company in Provo, Utah, his first task was to reassess the monitoring and IT automation tools the company used. After multiple proof-of-concept tests, Cabrera replaced Vivint's AppDynamics software with Dynatrace, which he finished rolling out in 2021.

Since then, Cabrera's team has used Dynatrace's Service-Level Objectives tool in its Cloud Automation Module to link user experience issues to a root cause within the IT infrastructure. Once the root cause is found, such as a slow disk query, Cabrera uses an integration between Dynatrace and the SaltStack infrastructure-as-code tool to auto-remediate routine troubleshooting.

"We're having Dynatrace hook into our Salt and say, 'There appears to be a problem with this query. Let's stop it. This VM is running low on disk. Let's run a [command] and see if that clears it up. We've got a pod in Kubernetes that is slow-acting -- let's refresh that pod,'" he said.

Other customers said they have begun to use Dynatrace tools to do automatic rollbacks of code changes in pre-production environments.

Our intention is to cover the entirety of our stack with automated remediation and get to the point where I don't have to rely on an engineer to do rollback, because Dynatrace has already done it for us.
Alex HibbittEngineering director of SRE and fulfillment, Albelli-Photobox Group

"Our intention is to cover the entirety of our stack with automated remediation and get to the point where I don't have to rely on an engineer to do a rollback, because Dynatrace has already done it for us," said Alex Hibbitt, engineering director of SRE and fulfillment at Albelli-Photobox Group, an e-commerce photo product company based in Amsterdam.

Retailer Best Buy consolidated from 15 monitoring tools onto Dynatrace for full stack observability after a disastrous incident on Thanksgiving Day in 2019. The company uses AIOps features for root-cause analysis and alert reduction, with the goal of adding automated quality gates to app development pipelines next.

"We were doing 50,000 notifications per day, and now we're down to about 1,500 meaningful enrollment problems," said Pete Krueger, director of reliability engineering at Best Buy, during a customer panel session at the Perform event. "We're looking … to use the anomaly engine to be able to say if a problem is related to a version deployment release."

Dynatrace overhauls Cloud Automation Module

Even IT pros that have overcome early mistrust of AI and embraced the concept of auto-remediation still face hurdles to pushing it out more broadly.

"Time -- that's the only thing between me and NoOps right now. I'm a believer in it," Cabrera said. "It's just a matter of time to put in remediation across the stack."

Among these time-consuming tasks for Cabrera currently is using Terraform and the Dynatrace Monaco configuration-as-code tool to automate Kubernetes auto-scaling, he said.

For other Dynatrace users, pervasive AIOps first requires pervasive usage of Dynatrace observability tools, and that change can be slow.

"Turning off the legacy tools you've been using isn't as easy as it sounds," Krueger said during the panel session. "It's a religious debate."

Dynatrace reps also said they've knocked down another hurdle to AIOps adoption with two new platform features rolled out this week, AutomationEngine and AppEngine. These frameworks, which provide both no-code/low-code and as-code alternatives, can create event-driven workflows and custom apps, and underpin a revamped Cloud Automation Module, now renamed Automation.

Under the new Grail data management back end, Automation will support a broader range of potential workflows, including security automation, and be more open to user customization. AutomationEngine and AppEngine broaden and refine Dynatrace's support for building custom workflows and business logic accordingly, said Bonifaz Kaufmann, vice president of product for Cloud Automation at Dynatrace.

"We listened to our customers," Kaufmann said. "They had trouble adopting [Cloud Automation] and it was a little bit complex to set up -- now it's getting much easier."

AutomationEngine and AppEngine will be generally available for Dynatrace SaaS customers next quarter.

Beth Pariseau, senior news writer at TechTarget, is an award-winning veteran of IT journalism. She can be reached at [email protected] or on Twitter @PariseauTT.

Customer panel presentation at Dynatrace Perform
A customer panel presentation at Dynatrace Perform with, from left, Debbie Umbach, Dynatrace; Pete Krueger, Best Buy; Dave Catanoso, U.S. Department of Veterans Affairs; Harbinder Panesar, Lloyds Banking Group.

Dig Deeper on Systems automation and orchestration