Datadog DASH updates push into fresh IT automation turf
A series of product updates at Datadog DASH broke out of the vendor's usual observability domain and into territory held by Atlassian, PagerDuty and others.
More than a dozen product updates highlighted during Datadog's DASH conference this week extended the direct actions users can take from within its observability platform, venturing into fresh competition with larger platform vendors.
Keynote presentations at Datadog DASH led with traditional fare, such as a new LLM observability tool that can troubleshoot and trace issues within large language model-powered applications and related data systems as well as integration between its Datadog Agent and the OpenTelemetry data collector. But the rest of the 14 product updates featured this week were focused on new ways users can automatically troubleshoot and remediate issues in both apps and infrastructure, from Kubernetes autoscaling and live debugging to security incident response and on-call incident management.
"[IT] automation, remediation and incident management are all big steps forward and into the Atlassian install base [and where], for on-call, PagerDuty is a leader," said Stephen Elliot, an analyst at IDC. "Buyers have more quality products to choose from than ever before, [as] Datadog announcements show they want to chip away using their platform … in these disciplines."
Datadog casts wide IT automation net
Among the IT automation features Datadog added to its platform this week is a private beta version of a fresh Kubernetes autoscaling utility. It draws on upstream open source mechanisms such as Horizontal Pod Autoscaler, Vertical Pod Autoscaler, Karpenter Kubernetes compute provisioning, and Cluster Autoscaler and combines them into a new custom resource definition. This mechanism can then optimize Kubernetes clusters based on observability data about their resource usage to trim unnecessary infrastructure costs or address performance bottlenecks.
Users will have the option of using Datadog Kubernetes Autoscaling to optimize system resource usage on a one-time basis or continuously, said Danny Driscoll, senior product manager for container and Kubernetes monitoring, during a Datadog DASH keynote presentation.
"In our latest research, we observed that more than 65% of Datadog-monitored Kubernetes containers are still using less than half of the requested memory and CPU resources," Driscoll said. "There's still more we can be doing here."
Further, Datadog DASH IT automation updates included a Live Debugger, now in private beta, that uses production data to reproduce and guide developers' remediation of application errors. Another private beta feature, Change Tracking, correlates application changes with IT infrastructure issues.
Change Tracking will also feed into two more early-stage updates previewed this week: a new product named Datadog On-Call and a fresh version of Datadog's Bits AI copilot that will perform autonomous investigations into IT incidents, based on a generative AI model the company has been training in a dedicated incident simulation environment, according to Sajid Mehmood, vice president of engineering at Datadog, during a keynote presentation.
Mehmood demonstrated autonomous investigation workflows, such as automatically analyzing real user monitoring data to see which users are affected by an issue, suggesting an incident be declared, generating an incident summary and status page updates, and contributing to an incident response thread in Slack. In Mehmood's demo, the Bits AI bot took cues from that conversation to surface relevant telemetry, such as Change Tracking data. Additionally, it generated a first draft of an incident postmortem during the demo.
"To transform Bits AI into an independent investigator, we invested heavily in AI agents … optimized specifically for the multi-user, multi-threaded environment of incident response," Mehmood said, partially echoing the AI agent strategy Atlassian articulated during its Team 24 conference in April. Dynatrace also offers observability-driven incident remediation and AIOps features, including for security.
The Bits AI updates are the most compelling among Datadog's announcements this week, said Andy Thurai, an analyst at Constellation Research.
"The autonomous AI agent can streamline the process of investigating alerts and responding to incidents, similar to Splunk's App for Incident Intelligence," he said.
These features will also underpin a new product now in private beta, named Datadog On-Call, that could directly challenge PagerDuty and Atlassian Opsgenie. Datadog's pitch for On-Call is that it reduces context switching for IT ops pros between observability and incident management tools, through an integration with Datadog's mobile app that builds direct access to observability data.
"What I just showed you isn't a paging solution," said Daljeet Sandu, product manager at Datadog, after demonstrating On-Call integration with the mobile app during a DASH keynote presentation. "What I showed you is a single platform for monitoring, securing, paging and investigating issues on the fly."
Other observability and DevOps platform vendors have tried to encroach on PagerDuty's lead in incident management and haven't gotten very far, Thurai said.
"The acquisitions of VictorOps and OpsGenie didn't help Splunk or Atlassian crack that nut that much," Thurai said, estimating PagerDuty still dominates 70% to 80% of the incident management market. "If [Datadog is] successful, this is a huge market potential for them. While it may not be possible to rip and replace entrenched PagerDuty customers, [some] customers might be open to combining that with Datadog observability. Only time will tell."
Datadog expands security automation, challenging giants
In the security realm, Datadog Security added agentless scanning to its existing Cloud Security Management tool. A keynote demonstration showed the newly generally available feature automatically populating the product's Security Inbox interface with a prioritized list of security issues. Users can click on individual issues surfaced by agentless scanning to see a new security context map visualization that includes a remediate button.
Among the remediation options users can invoke in that Cloud Security Management workflow is opening an automatically generated infrastructure as code (IaC) pull request to fix an issue. Users also have the option of sending messages to others in their organization through Slack or creating Jira tickets to kick off remediation.
Datadog Security further expanded to include interactive application security testing (IAST) in a new product shipped this week named Datadog Code Security. That product joins a new software composition analysis tool, Datadog SCA, launched in February, to address the far-left side of DevSecOps pipelines.
"The IaC auto-remediation is a great step toward a more developer-friendly and frictionless remediation workflow rather than just creating a Jira ticket or Slack message," said Katie Norton, an analyst at IDC. "While I would not say this is revolutionary, it is certainly following a trend toward providing developers both the context and the fix in their core tools and making the remediation process much less time-consuming.
"It's also good to see Datadog expand their code security capabilities to IAST and [add] better code-level insights to complement their runtime understanding of the application," Norton said.
Datadog comes from an application performance management (APM) background and appears primarily focused on application security, while Cisco-Splunk security automation features have roots in infrastructure log monitoring and are geared toward security operations center analysts. Cisco-Splunk, considered leaders in the security information and event management market, also already have the ears of large enterprise IT buyers.
But Datadog appears ready to stake out new territory in the security automation market under Sara Varni, hired as Datadog's chief marketing officer in February, who previously held leadership positions at Twilio and Salesforce, Elliot said.
"They claim to have 6,000 customers using one or more security products," he said. "[But] Datadog has to continue to increase awareness. … Customers often don't know what they offer. Their new CMO should play a major role in this."
LLM Observability, OpenTelemetry tie-in lead APM updates
Finally, Datadog countered advances by a more traditional competitor, New Relic, with the launch in general availability of its LLM Observability tool this week. New Relic rolled out its own AI Monitoring in April and partnered with Nvidia to integrate that product with the GPU vendor's NIM microservices orchestration framework this week.
Enterprises can expect further consolidation between DevSecOps tools and MLOps, as also evidenced by JFrog's acquisition of Qwak this week, according to Norton.
"Generally, the industry has recognized that to have a holistic view of application risk that enables effective prioritization and efficient remediation, you can't understand the code and the infrastructure in silos," she said. "Further, organizations continue to want to consolidate tools and vendors."
However, Datadog LLM Observability could hit a snag with some enterprises concerned about their data being used to train LLMs. A disclaimer in Datadog's documentation states, "By using LLM Observability, you acknowledge that Datadog is authorized to share your Company's data with OpenAI Global, LLC for the purpose of providing and improving LLM Observability."
"Not many enterprises are comfortable with sharing their data with OpenAI unless they have a direct contractual agreement," Thurai said. "If they offer a private instance of [GPT] just for use with Datadog and the [right] contractual terms, I can see enterprises getting comfortable using this."
Datadog isn't alone in its ambitions to be the primary source of IT automation control for enterprises. The growth of the OpenTelemetry standard for observability data collection has made it easier than ever for IT pros to switch vendors amid consolidation trends.
Take GitHub, for example. The Microsoft-owned source control vendor began to migrate to open source OpenTelemetry collectors behind its existing APM tooling as soon as an early version of the utility became available in May 2021. By the time OpenTelemetry reached general availability in 2022, GitHub had begun shifting all its hundreds of application services to it, which took until July of 2023.
Katie NortonAnalyst, IDC
Because of this long-term migration, switching from its APM vendor of seven years to Datadog took four months, according to Michele Titolo, principal software engineer at GitHub, during a Datadog DASH keynote presentation.
"[It was] four months from our initial, 'Let's just think about migrating,' to actually performing the migration and getting onto the [Datadog] platform," Titolo said. She did not name the previous APM vendor. "That's the power of OpenTelemetry and using vendor-agnostic tooling."
Datadog this week embedded the OpenTelemetry Collector utility into its Datadog Agent. This melding of data collection tools adds fleet-level management and enterprise support for OpenTelemetry, automatic instrumentation and fine-grained control over OpenTelemetry data management, according to Datadog DASH presentations.
Users can still bring their own OpenTelemetry collector if they want. But there's significant potential value in this integration for enterprises, said Gregg Siegfried, an analyst at Gartner.
"Other vendors in the space are working on similar things -- central collector management -- as part of their agent platform," Siegfried said. "There may be less reliance on [vendor] agents over time as more [workloads] migrate to OpenTelemetry. Nut now that we are in [an] intermediate state, having this kind of flexibility is a great option."
Beth Pariseau, senior news writer for TechTarget Editorial, is an award-winning veteran of IT journalism covering DevOps. Have a tip? Email her or reach out @PariseauTT.