Getty Images

Monte Carlo aids data observability with root cause analysis

The vendor's new capabilities are designed to enable Databricks and GitLab users to quickly discover and resolve the underlying causes of data quality problems.

Data observability specialist Monte Carlo on Monday unveiled root cause analysis capabilities aimed at making it faster and easier to identify and resolve data quality incidents.

The vendor's new root cause analysis tools are targeted specifically at diagnosing changes to Databricks query code and GitLab pull requests. It plans to add similar capabilities for addressing incidents in other platforms as well, according to Lior Gavish, Monte Carlo's co-founder and CTO.

Root cause analysis is a method of discovering the underlying reasons for code changes that lead to poor data quality.

By identifying the cause of a change as it occurs, root cause analysis enables developers and engineers to address the change before it has consequences, making it a significant part of the data management process, according to Kevin Petrie, an analyst at BARC U.S.

"A viable data observability program encompasses detection, assessment and remediation," he said. "Once you detect an issue, you need to find the root cause so you can assess, triage and remediate, for example, by debugging or replacing a bad data pipeline."

Based in San Francisco, Monte Carlo is a data observability vendor whose platform enables customers to monitor data as it progresses through the pipeline from its ingestion through integration and ultimately analysis. Its purpose is to ensure that the data used to train models and applications, feed dashboards, and inform decisions is accurate and up to date.

A viable data observability program encompasses detection, assessment and remediation. Once you detect an issue, you need to find the root cause so you can assess, triage and remediate.
Kevin PetrieAnalyst, BARC U.S.

In 2023, the vendor added data observability for vector databases, which have become a crucial part of retrieval-augmented generation pipelines used to train generative AI tools. Also that year, Monte Carlo launched its own generative AI capabilities, enabling customers to create SQL code using natural language and alerting users to coding problems with suggested fixes.

New capabilities

Data quality is imperative, perhaps more so now than ever as enterprise interest in AI increases and more processes get automated.

With data being the foundation for analytics and AI, data needs to be accurate for the decisions based on analytics and AI to likewise be accurate. Meanwhile, with data volume increasing exponentially and the complexity of data also rising, it's impossible for even teams of humans to monitor every data point and data set for quality.

In response, vendors such as Monte Carlo and other data observability specialists including Acceldata, Metaplane and Soda have developed platforms that automatically monitor data for quality and alert users when incidents occur.

Those alerts, however, have to do with the data points and data sets rather than the underlying code. Therefore, to remedy an incident, data engineers and other experts still have to trace the incident back to its source -- its root cause -- before it can be fixed. That process can take on average 15 hours, according to a survey of more than 200 data professionals by Monte Carlo and Wakefield Research.

That's nearly two full workdays just to find the source of a single incident and get it remedied. Root cause analysis aims to eliminate much of the time and expense related to discovering and resolving the changes that cause data quality problems.

Data quality issues can often be traced to one of three causes, according to Gavish: problems with the data itself, something amiss in a system or trouble with code.

Monte Carlo's new root cause analysis capabilities specifically target issues with Databricks and GitLab code, whether they are simple coding mistakes by developers and engineers or unforeseen consequences of intentional updates.

Developers and engineers get alerts when incidents are detected, including information that correlates the incident with the specific change that caused it. As a result, downtime -- the time it takes to resolve data quality problems -- is reduced by about 80%, according to Monte Carlo.

"When it comes to resolving data issues, speed is everything," Gavish said. "Being able to quickly root cause code-related issues leads to faster resolution. When you have visibility into data, systems and code issues all in one platform, it's much easier to understand the root cause."

Five problem-solving steps in root cause analysis.
How to approach root cause analysis.

Specifically, using Monte Carlo's new root cause analysis capabilities, the data observability vendor's customers can easily view Databricks query logs and query changes for each table. By doing so, they can see whether there was a query change to that particular table or a table in another part of the data pipeline, and if that change is the cause of the problem.

Similarly, Monte Carlo's new capabilities enable GitLab users to see which pull requests are linked to which tables. This can help users understand when those requests occurred and new code was merged, and if that new code is causing a data quality issue.

Given the visibility they enable, Monte Carlo's new root cause analysis capabilities for Databricks query code and GitLab pull requests are significant for the data observability vendor's customers, according to Petrie.

"Data teams frequently revise transformation code to meet changing business requirements, adjust formats, filter columns and so on," he said. "While they try to minimize errors by branching and testing pipeline code, some problems inevitably get into production and break data quality. Monte Carlo helps data engineers spot those errors faster by autodetecting anomalous logs."

Given its potential impact for developers and engineers, Monte Carlo has plans to expand its data observability platform to include root cause analysis capabilities beyond Databricks and GitLab.

Databricks and GitLab are each popular environments for developers, with Databricks aggressively building an environment for developing generative AI, traditional AI and machine learning models during the past couple of years. However, many developers and engineers prefer other platforms for building data and AI models and applications.

To meet their needs, Monte Carlo plans to expand its root cause analysis capabilities beyond Databricks and GitLab, according to Gavish, though he did not specify which platforms the company plans to target next.

"We are constantly exploring and building new and stronger ways to enhance Monte Carlo's resolution capabilities," he said. "We believe strongly in empowering our customers to resolve data issues right where they start."

Future plans

With root cause analysis for Databricks query code changes and GitLab pull requests now available -- and plans already in place to add more root cause analysis capabilities -- Monte Carlo's product development roadmap is focused on three main themes, according to Gavish:

  • Further expediting incident resolution.
  • Expanding data observability to cover the entire data management process from ingestion through analysis.
  • Applying data observability to AI applications.

More root cause analysis addresses expediting incident resolution. Recent integrations with Informatica and Microsoft's Azure Data Factory aim to expand Monte Carlo's data observability capabilities to more of the data management process. And integrations with vendors such as vector database company Pinecone are geared toward applying data observability to AI development.

"It's our vision to continue evolving Monte Carlo into a platform that can not only detect, but resolve and ultimately prevent issues from wherever they derive in our customers' data stacks," Gavish said.

Petrie, meanwhile, suggested that Monte Carlo expand its data observability capabilities beyond monitoring for data quality.

Data observability does not have to be limited to the data itself, he noted. It can extend to monitoring the performance of the processes that make up a data pipeline and prepare data from the time it's first ingested to the points when it's ready to inform analysis.

"Monte Carlo traditionally focuses on data quality observability," Petrie said. "I'd be interested to see them expand in adjacent spaces such as data pipeline observability, which focuses more on the performance of underlying infrastructure."

Eric Avidon is a senior news writer for TechTarget Editorial and a journalist with more than 25 years of experience. He covers analytics and data management.

Next Steps

Experts: How to get data ready for AI development

Dig Deeper on Data management strategies