Getty Images/iStockphoto

Confluent, Databricks partner to simplify AI development

With a new integration, joint customers can more easily access real-time streaming data to develop, deploy and maintain trusted AI models and applications.

Databricks and Confluent unveiled a partnership that includes an integration aimed at making it easier for joint customers to use real-time streaming data to train and inform AI models and applications.

Based in San Francisco, Databricks pioneered the data lakehouse format for managing analytical data. In addition, its Data Intelligence Platform provides capabilities that enable customers to develop and deploy analytics and AI models and applications.

Confluent, meanwhile, is a streaming data specialist based in Mountain View, Calif., whose Data Streaming Platform is built on Apache Kafka and enables customers to access real-time operational data.

The bi-directional integration between the vendors connects Confluent's Tableflow, a tool that lets users quickly move operational Kafka data into data storage repositories as tables, and the Delta Lake table storage format developed by Databricks. In addition, it connects Confluent's Stream Governance with Databricks' Unity Catalog to unify data governance.

Using the integration -- unveiled on Feb. 11 two days before Databricks revealed a major integration with SAP -- joint customers can more easily discover and govern data across separate operational and analytical systems, subsequently making the AI development process more efficient.

As a result, the partnership between Databricks and Confluent is significant, according to Kevin Petrie, an analyst at BARC U.S.

"Between Confluent and SAP, Databricks is striking some big partnerships to strengthen its ecosystem," he said. "Confluent's Kafka portfolio gives Databricks users the real-time structured and semi-structured data they need for AI model training, prompting and inference."

Common applications of AI include machine learning, predictive analytics and chatbots, which will fail without instant access to operational data, Petrie continued. Kafka weaves together such data, enabling AI models to understand events and trends.

"These data streams enrich other data within … so that companies can train, fine-tune and prompt their models with the right intelligence," Petrie said

The integration

With generative AI tools capable of making workers better informed and more efficient, overall investment in AI development is surging. However, many enterprises struggle to derive value from those investments, and part of their problem is the data used to train and inform AI tools.

One persistent problem is poor data quality, which leads to poor outputs and, ultimately, a lack of trust.

A recent study published by data management vendor Ataccama and Hanover Research found that only a third of 300 senior data leaders reported having meaningful success developing and deploying AI applications. More than two-thirds cited a lack of trusted data as the primary problem preventing success.

Similarly, discovering relevant data across sprawling IT systems that isolate data is a hindrance. A recent Databricks survey of over 1,000 technologists showed that less than one quarter are confident their data infrastructures can support AI applications.

The integration between Confluent and Databricks aims to ease some of the barriers to successful AI development by bringing real-time streaming data together with other systems in a single, governed location.

Given the confluence of capabilities, the partnership benefits both joint Confluent and Databricks customers as well as customers of one but not the other but can now access through the integration, according to Petrie.

"Confluent customers benefit because they integrate with an established lakehouse platform on which they can build and manage AI applications that consume their streaming data," he said. "Databricks customers benefit because they get easier access to real-time data that makes their models more accurate."

Databricks previously had data streaming capabilities. However, the integration adds connectors to hundreds of operational data sources that enable joint customers to stream data as Delta tables into Databricks easily, according to Ori Zohar, Databricks' product marketing leader for data engineering.

In addition, although joint customers could previously connect Confluent and Databricks to stream data, converting Kafka logs to Delta tables required sophisticated, time-consuming engineering, according to Paul Mac Farland, SVP of Confluent's partner and innovation ecosystem.

Via Tableflow, operational Kafka logs can be converted to Delta tables so they can be moved into Databricks for data transformation, feature engineering and model training. Meanwhile, by also joining Stream Governance with the Unity Catalog, the integration ensures that data moving between operational and analytics systems remains governed, traceable and compliant.

The intended result is discoverable, trusted and secure data that can be used to develop AI tools.

Beyond transforming real-time Kafka logs to Delta Lake tables and unifying data governance, the integration features the following:

  • Continuous data streaming rather than batch file uploads to speed development.
  • Optimized model accuracy by including the latest information.
  • A flow that sends AI-driven insights back to operational systems where businesses can automate responses to the insights rather than rely on manually operated processes.

While the integration benefits joint Confluent and Databricks customers investing in AI development, the vendors also benefit. Given their ease of use together using the integration, each vendor becomes more attractive to customers of one who may not be customers of the other. In addition, they each become more attractive to greenfield customers who value open source technologies, according to Zohar.

"This new integration will be very appealing to customers who value building their solutions on proven open format technologies [such as] Apache Kafka, Delta Lake and Unity Catalog," he said.

The future

Following the launch of the first integration between Confluent and Databricks, the vendors plan to develop and release further integrations in the coming months, according to Mac Farland.

In addition, the vendors' sales and marketing teams will work together to encourage adoption of one another's capabilities, Mac Farland said.

Petrie, meanwhile, suggested that as Databricks continues to add integrations and expand its ecosystem, it should do more to support on-premises and hybrid infrastructures. While many AI projects are developed in the cloud, some enterprises choose to develop them in more controlled environments.

"To extend their addressable market, they should give customers the option to install Databricks on premises while integrating back to the main cloud platform," Petrie said. "Partners such as Confluent and SAP could help them do this."

Eric Avidon is a senior news writer for Informa TechTarget and a journalist with more than 25 years of experience. He covers analytics and data management.

Dig Deeper on Data management strategies