Getty Images/iStockphoto

Ascend.io, Databricks integration improves data visibility

The update includes support for Databricks' Unity Catalog to enable joint customers to better organize and view datasets that can be used to inform data science and BI projects.

Ascend.io on Thursday launched an updated version of its integration with Databricks designed to better enable joint customers to view their data as well as share it and use it to collaborate.

This is the third generation of the integration between Ascend.io and Databricks, adding connectivity between Ascend.io and some of Databricks' newer capabilities.

Ascend.io's Data Automation Cloud enables users to develop data pipelines with automated ingestion, transformation, orchestration and observability capabilities. In late 2022, the 2015 startup extended its partnership with Databricks rival Snowflake, a cloud data vendor, with the offer of free data ingestion for Snowflake customers.

Databricks, meanwhile, is a data lakehouse vendor whose platform combines the structured data management capabilities of data warehouses with the unstructured data management capabilities of data lakes.

Over the past two years, a major push for Databricks has been developing industry-specific versions of its lakehouse to simplify data management and data science for organizations in specific verticals. Most recently, the vendor launched its Lakehouse for Manufacturing in April, the fifth such industry-specific version of its platform.

The integration

Despite Databricks' recent efforts to simplify data management and data science, the vendor's customers sometimes have difficulty managing changes within their data pipelines, according to Kevin Petrie, an analyst at Eckerson Group.

As a data pipeline specialist, Ascend.io provides tools that address that exact problem, making Ascend.io and Databricks logical partners.

"Databricks users struggle to manage changes to pipeline code and data," Petrie said. "Ascend.io eases this pain with a control layer that spots and propagates these changes across the data lake in a controlled fashion."

Perhaps the most significant improvement the updated integration adds is native connectivity to the Databricks Unity Catalog, Petrie continued.

First generally available in June 2022, the Unity Catalog is a data catalog designed to help organizations connect their data, put data governance measures in place and track their data lineage. Data catalogs provide an important way for organizations to organize and oversee their data even as the volume and complexity of data they ingest and manage increases.

The integration automatically catalogs all datasets created in Ascend.io in the Unity Catalog. That subsequently enables joint customers to search the datasets in their lakehouse and improves access to data that can be used to inform data projects.

"A key aspect of this announcement is the automated integration of Ascend data assets in Databricks Unity Catalog," Petrie said. "Data teams really want comprehensive views of all the data and metadata in their cloud environments, and this integration definitely helps."

Similarly, Sean Knapp, founder and CEO of Ascend.io, noted the importance of the connection between the vendor's platform and Databricks' Unity Catalog.

"The Unity Catalog is a huge part of Databricks' product offering and a huge part of their strategy with their customers. So we wanted to make sure that we supported that as a first-class citizen," he said.

A key aspect of this announcement is the automated integration of Ascend data assets in Databricks Unity Catalog. Data teams really want comprehensive views of all the data and metadata in their cloud environments, and this integration definitely helps.
Kevin PetrieAnalyst, Eckerson Group

In addition to native connection between Ascend.io and Databricks' Unity Catalog, the updated integration includes support for Databricks' SQL compute platform so that developers using Ascend.io can more easily take advantage of Databricks' compute power in their lakehouse.

Finally, the integration adds the ability for joint customers to take advantage of Databricks' data transformation capabilities such as job batching and merging data in their Ascend.io environment.

By doing their job batching and merging in Ascend.io rather than Databricks, the integration reduces the costs associated with moving data back and forth between different cloud platforms as well as saves the time it would normally take to send data between environments.

"Integrating Databricks transformations into the Ascend platform is also a step forward," Petrie said. "It further reduces the need for data engineers to toggle between multiple interfaces."

Knapp, meanwhile, said that overall, the updated integration is aimed at more closely combining the capabilities of Ascend.io and Databricks than previous iterations of their integration.

Joint customers ask that Ascend.io keep up with Databricks' latest innovations to enable them to use the two vendors' most up-to-date capabilities in concert with one another, he noted. The updated integration was spurred by those requests.

"It's a much tighter integration," Knapp said. "Before, we'd send a lot of workloads into Databricks. But now we're really getting in deep with the patterns of jobs and how to utilize our infrastructures. We did a lot of research on optimization."

Next steps

In addition to Databricks, Ascend.io maintains deep integrations with Snowflake and the Google BigQuery cloud data warehouse.

It also offers an integration with the Amazon Redshift cloud data warehouse, but the integration only enables joint users to write data back and forth and does not enable the pushdowns that eliminate the need to move data between environments.

Ascend.io does not yet provide an integration with Microsoft Azure. But an integration with the Azure Synapse cloud data warehouse and analytics service -- now part of Microsoft's new Fabric that combines various data tools -- could be next for the vendor, according to Knapp.

"As we look at where our customers are building out their next-generation architectures, most are users of Snowflake or Databricks," he said. "But we'll see more customers, over time, using the native Synapse. If I were to look at trends, I think Synapse is on the uptrend for adoption."

Regarding the next iteration of Ascend.io's integration with Databricks, Knapp said Ascend.io's planning engine that determines which jobs should be run will be connected to Databricks.

Other items on the vendor's roadmap include investing in helping customers migrate from older systems into a more automated realm and adding AI and automation capabilities aimed at enabling customers to speed and simplify data management.

"Our focus is on helping them automate more of the monotony," Knapp said.

Petrie, meanwhile, said Ascend.io's focus on AI to make data management both easier and faster is right for the vendor.

Numerous data management and analytics vendors have made AI their primary focus in the months since OpenAI launched ChatGPT in November 2022 -- representing a leap in generative AI and large language model capabilities. Petrie said he's curious to see what Ascend.io's approach to incorporating AI will be.

"Knapp has an interesting vision of how large language models can accelerate data management processes and empower data engineers to be more strategic with data pipeline design," he said. "I look forward to seeing how Ascend.io's strategy evolves in this regard."

Eric Avidon is a senior news writer for TechTarget Editorial and a journalist with more than 25 years of experience. He covers analytics and data management.

Dig Deeper on Data governance