Getty Images/iStockphoto

Astronomer adds DBT Core support to data orchestration suite

The vendor's latest Astro platform update integrates the open source transformation tool to help users better prepare information for analytics and AI models and applications.

Astronomer on Wednesday unveiled its latest data orchestration release, which includes support for DBT Labs' open source platform and aims to provide customers with improved data pipeline performance and security.

Astro, the vendor's suite for readying data for analysis, is built on Apache Airflow, an open source data orchestration tool that provides more than 1,600 integrations with databases, AI frameworks and other platforms key to developing AI and analytics models and applications and analyzing data.

By adding support for DBT Core, the open source version of the vendor's data transformation platform, Astronomer is enabling customers to run Airflow and DBT together to enable improved data pipeline performance over running Airflow alone.

Data transformation is part of the overall orchestration process. While Airflow provides general-purpose capabilities for the entire orchestration process, DBT provides a specialized set of features that improve the transformation stage.

As a result, the Astronomer Astro update adds a useful capability for the vendor's users, according to Donald Farmer, founder and principal of TreeHive Strategy.

"Astronomer's integration of DBT into their Astro platform is quite significant," he said. "This integration means that the pipeline can be managed with a single set of tools rather than in parts. Now that Astro users can access these capabilities, they may be able to do more sophisticated and flexible data modeling within their pipelines."

Based in New York City, Astronomer is a 2018 startup that specializes in helping customers manage data pipelines to ready their data for analysis.

To date, the data orchestration vendor has raised $282.9 million in financing, including $213 million in March 2022 just before funding for tech vendors tightened and tech stocks slid amid fears of a recession and with worldwide events such as the Russia-Ukraine War and repeated supply chain disruptions causing economic uncertainty.

New capabilities

While enterprises have long recognized the importance of data to inform decisions, readying that data for analysis has always been a challenge.

Problems such as data isolation, data duplication, incomplete data and incorrect data hamper data quality, and without high quality data, organizations can't rely on their data to inform decisions. Until the last decade or so when most data was kept in on-premises databases and real-time decision-making was a luxury, enterprises could address data quality over time and use data products such as weekly, monthly and quarterly reports.

Now, however, the speed of business moves faster than ever and organizations need to be able to act and react as events happen. In addition, the volume of data organizations collect is increasing exponentially and the complexity of that data is also rising, making human oversight of data quality an impossibility.

In response, specialized vendors such as Astronomer, Acceldata, Monte Carlo and Rivery have emerged with AI-powered automation tools to help enterprises address data quality as they ready data to inform AI and analytics models and applications.

While Acceldata and Monte Carlo address data quality by enabling users to automate data observability as data moves through pipelines, data orchestration vendors including Astronomer and Rivery enable customers to automate combining and organizing of data from disparate sources.

"The aim is to reduce data silos and give a unified view of data," Farmer said. "However, automation also improves and reduces manual intervention, saving time and resources while -- in theory -- minimizing errors."

As a result, data orchestration vendors such as Astronomer and Rivery address a real need as data volume and complexity increase and real-time insights are needed, he added.

While Astronomer specializes in data orchestration, DBT Labs specializes in transforming data from one format, such as a database file or Excel spreadsheet, to another so it can be combined with other data to inform AI and analytics tools.

Astronomer first added support for DBT Core to its support for Apache Airflow in 2023 in an open source package called Cosmos. The feature enables users to integrate DBT Core projects into Airflow with just a few lines of code.

Now, Astronomer is adding support for DBT Core in Astro so that users can manage DBT Core and Airflow from a single interface that uses the same code deploy process for two distinct platforms that otherwise don't share the same code. The result is a reduction the context switching required by using Airflow and DBT Core in Cosmos -- which adds to wasted time -- and an experience designed to improve efficiency to speed decision-making.

While saving time is one benefit of support for DBT Core in Astro, its main benefit is the full integration of a set of tools that improve the overall data orchestration process by better preparing disparate data types to be combined, according to Stephen Catanzano, an analyst at TechTarget's Enterprise Strategy Group.

"The DBT integration is highly significant," he said. "DBT has become a standard for data transformation and combining it with Airflow's orchestration capabilities creates a powerful and flexible data engineering stack. This integration simplifies complex data pipelines, improves collaboration and accelerates development cycles."

The impetus for combining Astronomer's support for Airflow with support for DBT Core came largely from Astronomer's customer feedback, according Julian LaNeve, the vendor's chief technology officer.

He noted that Astronomer has an integration with DBT Cloud, which is DBT Labs' fully managed service, but that many DBT users are migrating off Cloud and onto Core instead. Those Astronomer customers using Airflow and DBT Core together, therefore, asked for more seamless integration between the open source tools.

"We're at a point where we want to start making consolidation plays around making sure our customers can simplify their vendor relationships and run everything on one platform using the same infrastructure," LaNeve said. "That was where the feedback from customers around DBT in particular was helpful."

Customers told Astronomer that the pre-existing experience was "fine," he continued, but that it could be better.

Beyond adding support for DBT Core in Astro, Astronomer's latest platform update includes the following features aimed at improving the data orchestration process:

  • Universal Metric Export, a tool that lets customers export metrics to Prometheus, an open source platform that -- among other capabilities -- enables users to view metrics across the various platforms that make up their data stack so they can take proactive measures to ensure the health of their data.
  • Self Healing Workers, a feature that monitors Airflow infrastructures to find and stop idle processes so systems aren't running unnecessarily.
  • Astro Terraform Provider to simplify managing Airflow infrastructures by using HashiCorp's Terraform infrastructure management capabilities to automate tasks to ensure consistent, scalable deployments.
  • Customer Managed Workload Identity on AWS, a feature that enables governed access to AWS data services to improve security and compliance; Astronomer provides similar governance tools for Microsoft Azure and Google Cloud.

While the efficacy of the new features depends on how well they're developed, Self Healing Workers and Universal Metric Export hold promise, according to Farmer.

"The pipeline resilience features are promising," he said. "Self Healing Workers, if well implemented, could address a joint irritation for Airflow users by managing idle processes and stuck tasks. Similarly, the Universal Metrics Export solves an irritation but should also be very valuable for organizations optimizing large and complex orchestrations."

Catanzano similarly highlighted Universal Metrics Export and Self Healing Workers.

"Universal Metrics Export provides essential visibility into data pipelines, enabling proactive monitoring and troubleshooting, [while] Self Healing Workers enhances pipeline resilience," he said.

LaNeve, meanwhile, pointed to Astro Terraform Provider as a significant addition, noting that Terraform is gaining popularity with large enterprises deploying the infrastructure management capabilities to standardize code across different departments.

"It's something our customers have been asking for," he said.

Customer feedback provided the motivation for developing most of the new Astro features, LaNeve continued. The exception is Self Healing Workers, which resulted from Astronomer's attempt find ways to proactively help users.

Looking ahead

As Astronomer evolves, one of its goals is to add support for more tools such as DBT Core that enable customers to use specialized tools to augment Airflow's general-purpose data orchestration capabilities, according to LaNeve.

"As our customers give more feedback around where they want us to step in and help, we'll pay active attention and continue to look for opportunities," he said.

Beyond data orchestration, Astronomer plans to expand into data observability, LaNeve continued.

The vendor holds a lot of telemetry data based on its customers' use of Astro. That telemetry data provides insight into data reliability that can be used to help Astronomer develop data observability capabilities.

"Today, we give users the operations to make sure everything runs on time," LaNeve said. "In the future, we want to give you the observability to build trust and confidence."

Catanzano noted that Astronomer's plan to support additional tools through integrations is a sound strategy. In addition, advanced analytics, machine learning and improved collaboration capabilities all are areas in which the vendor could expand beyond data orchestration, he said.

Farmer, meanwhile, suggested that Astronomer focus on upgrading its tools to better handle the scale and complexity of enterprise deployments. For example, the startup vendor could add advanced compliance and policy management tools, more data governance capabilities and multi-region support.

"They must improve enterprise-level support for large-scale, complex … environments," Farmer said.

Eric Avidon is a senior news writer for TechTarget Editorial and a journalist with more than 25 years of experience. He covers analytics and data management.

Dig Deeper on Data management strategies

Business Analytics
SearchAWS
Content Management
SearchOracle
SearchSAP
Close