metamorworks - stock.adobe.com

Coalesce acquires data catalog upstart to aid transformation

After raising $50 million in venture capital funding in 2024, the startup acquired CastorDoc to add AI-powered data catalog capabilities to its data preparation platform.

Coalesce on Wednesday acquired CastorDoc, an AI-powered data catalog vendor whose capabilities enable customers to govern, organize and discover data.

Financial terms of the transaction were not disclosed.

Based in San Francisco, Coalesce is a data transformation startup aligned with Snowflake whose platform enables users of the Snowflake Data Cloud to cleanse, model and document data to ensure data quality and ready data for analytics and AI development.

In April 2024, Coalesce raised $50 million in venture capital funding, bringing its total funding to $81 million. At the time, co-founder and CEO Armon Petrossian said the vendor planned to use the funding to add AI-powered capabilities and expand beyond data transformation to appeal to a more widespread audience of potential customers.

Coalesce's acquisition of New York City-based CastorDoc and its data catalog represent that expansion beyond data transformation. The purchase adds AI-powered data catalog capabilities such as governance and observability, a move that Sanjeev Mohan, founder and principal of analyst firm SanjMo, called "great," given that catalog capabilities address data quality.

Metadata is the most crucial ingredient for the success of data and AI projects. With a built-in catalog, Coalesce can accelerate customers' projects while providing governance capabilities.
Sanjeev MohanFounder and principal analyst, SanjMo

"Metadata is the most crucial ingredient for the success of data and AI projects," he said. "With a built-in catalog, Coalesce can accelerate customers' projects while providing governance capabilities."

In addition to Coalesce, DBT Labs focuses on data transformation, while more broad-based data management vendors, including Informatica and Matillion, also provide data transformation capabilities.

Adding through acquisition

Data quality has always been important to effective data analysis. Good data leads to well-informed decisions, while bad data leads to poorly informed decisions.

However, surging interest in AI development is making data quality perhaps more important than ever.

AI applications such as assistants and chatbots are enabling more non-technical workers than ever to engage with data. Without the same expertise as trained analysts, they may not be able to identify incorrect or misleading AI outputs as easily as someone with more training, which places greater emphasis on the accuracy of AI tools and the underlying data used to train them.

Meanwhile, AI agents can act autonomously, surfacing insights, making recommendations and enabling enterprises to automate certain repetitive tasks and processes. While most enterprises ensure that humans monitor agentic AI outputs, reducing human input and placing more responsibility on machines similarly requires greater emphasis on AI accuracy and the data used to train models and applications.

While data transformation is one way to address data quality, data catalogs are another.

Using data catalogs, organizations can connect data across different domains such as finance and human resources; govern data across all domains to ensure it remains secure, consistent and properly used; and model and index data so it can be discovered and reused to inform analytics and AI tools.

As a result, Coalesce's acquisition of CastorDoc addresses a real need, according to Mohan.

"It's accurate to say data catalogs are needed for AI development, deployment and management," he said. "The main reason is that to get ready for AI, organizations are paying more attention to data curation, quality, lineage, sensitive data [and other] aspects of governance. If they want to do AI experimentation, they need to trust their data's accuracy."

Now that CastorDoc is part of Coalesce, CastorDoc's platform has been renamed Coalesce Catalog and will be integrated into Coalesce's platform in two phases.

First, data lineage, discoverability and AI-enabled metadata management will be added. Later, the full platform, including data governance, data observability, collaboration capabilities and data pipeline development, will be embedded into Coalesce's data transformation workflows.

Taking time to integrate Coalesce Catalog in two phases is wise, according to Donald Farmer, founder and principal of TreeHive Strategy.

"It shows serious thought about their roadmap," he said. "Far too often, acquisition announcements lack this kind of insightful and useful detail."

In addition, both Coalesce, founded in 2020, and CastorDoc, also founded in 2020, are relatively young companies, so it's important to ensure the careful combination of the two, Farmer continued.

"These are two fairly early-stage companies, which makes for an interesting integration scenario," he said. "It's unlikely that either one will have a dominant culture, but equally neither one will have so much baked into their platform that the sunk-cost effectively prohibits reworking, reuse or swapping out code and features where needed."

Ultimately, the combination has intriguing potential, according to Farmer.

"We could see something compelling emerge -- an automated catalog which understands transformation and can give a more complete view of metadata across the data landscape of an enterprise than a more traditional catalog, which only gives a static view of the data at rest in specific locations," he said. 

Regarding the impetus for acquiring CastorDoc, Petrossian cited the vendor's expansion plans. While the vendor initially specialized in data transformation, it always planned to broaden its platform to include other capabilities.

"From the start, we had ambitions to expand our platform's breadth and sought opportunities that had a strong product-market fit and mature technology but hadn't yet invested heavily in go-to-market," Petrossian said. "Data cataloging was a natural fit, offering strong synergy and a complementary solution."

Plans

After the acquisition of CastorDoc, Coalesce will better enable customers to use metadata to help develop and deploy analytics and AI tools, according to Petrossian.

However, while expanding beyond data transformation, Coalesce remains integrated with only Snowflake and no other data cloud and hyperscale cloud vendors, such as Databricks, AWS, Google Cloud and Microsoft.

Petrossian declined to specify whether Coalesce has plans to add integrations with other cloud data management providers, though he hinted that integrations are in the works.

"We'll have an announcement [about this] soon," he said.

Mohan, meanwhile, suggested that adding support for cloud data platforms beyond Snowflake is important if Coalesce is to grow and compete for market share.

"[They need] to expand into other cloud data ecosystems across hyperscalers," he said.

Eric Avidon is a senior news writer for Informa TechTarget and a journalist with more than 25 years of experience. He covers analytics and data management.

Dig Deeper on Data governance