Getty Images/iStockphoto

New Starburst, DBT integration eases data transformation

The integration enables data mesh adopters to work with data from multiple sources without having to move it in and out of a central repository for transformation and analysis.

Starburst unveiled an integration with DBT Cloud designed to enable joint customers to develop data pipelines that connect multiple data sources without turning to the expensive and time-consuming extract, transform and load process.

The integration, launched on April 27, sits as an adapter in DBT Cloud and connects to Starburst Galaxy, the vendor's SaaS offering.

Starburst is a data management and analytics vendor specializing in data mesh, which is a decentralized approach to analytics that encourages different domains within organizations to manage their own data within their own data lakes.

The intent is to take advantage of the domain expertise of users, with -- for example -- finance experts taking ownership of their organization's finance data and marketing experts overseeing their organization's marketing data.

In addition, data mesh is designed to eliminate the bottlenecks and wait times that inevitably result when all of an organization's data is overseen by a centralized team.

DBT Labs, meanwhile, is the leading vendor behind the open source DBT tool that enables engineers to transform data. A host of data management and analytics vendors have developed integrations with DBT to address data transformation, including Alation, Fivetran and ThoughtSpot.

Recently, DBT Labs acquired analytics vendor Transform to enhance DBT's semantic modeling capabilities.

The integration

Many organizations store their data in numerous locations rather than a single cloud data warehouse. Those locations can include anything from a modern data lake or lakehouse to a traditional on-premises database.

Traditionally, building data pipelines to connect those disparate data sources required data engineers to manually extract, transform and load (ETL) data over and over again -- a cumbersome and expensive process.

The integration between Starburst and DBT, however, provides joint users with a single control environment where they can bring data together from their multiple sources while eliminating traditional ETL.

Starburst and DBT's partnership aims to help enterprises increase the amount of data they analyze for a given project.
Kevin PetrieAnalyst, Eckerson Group

The result is both saving the time and cost associated with ETL as well as providing access to new data before it moves into a data repository, which enables more real-time analysis.

Those results stemming from the combination of Starburst and DBT are significant, according to observers.

"Starburst helps enterprises query data across enterprise environments that -- despite efforts at consolidation -- remain highly distributed while DBT helps analytics engineers and data engineers prepare data for analytics," said Kevin Petrie, an analyst at Eckerson Group. "When you combine these capabilities, you help enable data teams to transform data wherever it sits."

Using the capabilities of Starburst and DBT Labs in tandem, joint customers can spread their data across distributed environments and then query and analyze that data, he continued.

"So Starburst and DBT's partnership aims to help enterprises increase the amount of data they analyze for a given project," Petrie said.

Stephen Catanzano, an analyst at TechTarget's Enterprise Strategy Group, similarly noted that the significance of the integration lies in enabling Starburst customers to use DBT to work with data spread across multiple locations.

"DBT [offers] a way to develop, test, schedule, and investigate data models all in one place to accelerate building new analytics models," he said. "Starburst customers can leverage [DBT's] query engine and data transformation to utilize these new models across all locations."

While Starburst has largely focused on data mesh, the vendor has noticed many of its customers using its platform for data preparation and data modeling as well, noted Matt Fuller, Starburst's co-founder and vice president of product.

It was as a result of that customer activity that the vendor developed its integration with DBT, he continued. Previously, Galaxy customers had to manually run DBT Core, requiring them dedicate substantial engineering resources to data transformation.

"Customers wanted this integration with DBT Cloud in order to offload that engineering cost," Fuller said.

Beyond the benefits to joint customers, the integration between Starburst and DBT will likely result in marketing opportunities for both vendors, according to Catanzano.

While the integration is of greater technological benefit to Starburst than DBT, both vendors might benefit from exposure to new customers, he noted.

Petrie also said that there will likely be marketing opportunities for both vendors in addition to the technology gained by Starburst users.

Starburst and DBT have each been garnering increased interest among analytics consumers as data mesh becomes more accepted and advanced data transformation capabilities become more vital with the volume and complexity of data collected by organizations on the rise.

"Both Starburst and DBT have real market traction, [and they] expand their addressable markets with this partnership because Starburst specializes in large enterprises and DBT specializes in cloud-native startups," he said.

A sample screenshot from Starburst.
A sample screenshot from Starburst shows a dataset's metadata.

Future plans

Beyond its new integration with DBT, Starburst recently upgraded its platform to make it easier to discover data.

The vendor's February 2023 update included the addition of an automated data catalog and a new tool titled Warp Speed that automates the indexing and caching of data and results in substantially faster queries.

Moving forward, Fuller said Starburst plans to offer greater support for Python. The vendor already supports SQL, but as more users show interest in Python, the vendor wants to meet their needs. Ease of use will also be a focus, Fuller added.

"We've been very focused on improving data consumption for non-technical users," he said. "To that end, we're continually building out a feature set that supports the building and sharing of data products."

One trend Starburst has not yet addressed is developing an integration with a generative AI tool, Catanzano noted.

Sisense, Pyramid Analytics and ThoughtSpot are among those that have developed integrations with OpenAI since its release of ChatGPT in late 2022, while Databricks developed its own generative AI tool.

Therefore, with Starburst's integration with DBT demonstrating that it can respond quickly to market trends, Catanzano suggested an integration with a generative AI tool might be something the vendor considers moving forward.

"This integration shows them as a reactive company and likely continuing to integrate more advanced AI/analytics features into their product like ChatGPT, which they can access with their query engine," he said.

Eric Avidon is a senior news writer for TechTarget Editorial and a journalist with more than 25 years of experience. He covers analytics and data management.

Dig Deeper on Data integration