your123 - stock.adobe.com

New Databricks tools target AI quality, cost and security

The vendor's latest features aim to help customers improve model accuracy by securely developing compound systems that include multiple language models and RAG pipelines.

Databricks on Wednesday unveiled new features designed to enable enterprise customers to securely and cost-effectively develop generative AI models and applications that deliver high-quality outputs.

Among the new tools is the Mosaic AI Agent Framework, which targets model quality by enabling users to build retrieval-augmented generation (RAG) applications using foundation models with proprietary data. Another is Mosaic AI Gateway, an AI governance framework that addresses cost and security as well as quality.

In addition, Databricks revealed that as of Wednesday its Unity Catalog, a data catalog for data and AI governance, is open source.

Databricks introduced the new capabilities and Unity Catalog's new status during its Data + AI Summit, a user conference in San Francisco.

Collectively, the new features address the fragmentation that has long been part of data and analytics architectures, according to David Menninger, an analyst at ISG's Ventana Research.

Just as Databricks targeted fragmentation by helping to develop the data lakehouse format that combines the capabilities of data warehouses and data lakes, it is in the process of joining previously disparate data management, AI and analytics capabilities in one environment.

Databricks has established enough of a presence in the data platform market that they have the opportunity to reduce fragmentation. The lakehouse was the first step. ... Now, they are helping to ensure that all types of analytics -- including BI, AI and generative AI -- can be conducted on the same backplane.
David MenningerAnalyst, ISG's Ventana Research

"Databricks has established enough of a presence in the data platform market that they have the opportunity to reduce fragmentation," Menninger said. "The lakehouse was the first step. ... Now, they are helping to ensure that all types of analytics -- including BI, AI and generative AI -- can be conducted on the same backplane."

Based in San Francisco, Databricks is one of the pioneers of the data lakehouse storage format that combines the structured data storage capabilities of data warehouses with the unstructured data storage capabilities of data lakes.

Over the past 18 months, Databricks has expanded the breadth of its platform to include an environment for developing generative AI models and applications. Key to that environment was the June 2023 acquisition of MosaicML, which now forms the foundation for Databricks' AI and machine learning operations.

Addressing AI quality

Databricks has been aggressive in its creation of an AI and machine learning development environment over the 18 months since OpenAI's launch of ChatGPT marked significant improvement in generative AI capabilities.

With the potential to make data management and analytics easier and more efficient through true natural language processing that greatly reduces the need to know and write code, enterprises are eager to develop generative AI models and applications that understand their organization. In response, many data management and analytics vendors have introduced tools such as AI assistants that provide generative AI capabilities, and others such as vector search and integrations with large language models (LLMs) that enable generative AI development.

In addition to its acquisition of MosaicML, Databricks has made three other acquisitions aimed at helping customers build AI and machine learning applications. It also introduced a spate of new capabilities including developing an open source LLM, vector search and AI governance.

As Databricks has built up its AI and ML development environment, it has noticed a shift in the way enterprises are building AI models and applications, according to Joel Minnick, the vendor's vice president of marketing.

Initially, customers were using their data in concert with a single LLM to try to gain insight into their organization. The results, however, were uninspiring, with models and applications frequently delivering inaccurate outputs and taking lengthy times to respond to prompts.

Subsequently, customers began building what Databricks calls compound systems that combine proprietary data with multiple LLMs and other systems such as RAG pipelines at once. Using compound systems, Databricks has found that model accuracy rises significantly, and response times get much faster, Minnick said.

Databricks recently made vector search -- a key component of RAG pipelines -- generally available to help customers develop compound systems.

The new capabilities unveiled on Wednesday are similarly designed to enable the compound system development that results in quality outputs, according to Minnick. In addition, they are designed to enable that compound system development securely and cost-effectively.

"As we've thought about what we want to invest in from a Databricks perspective, it's that we want to make it easier for customers to start building these compound systems," Minnick said.

Perhaps the most significant new capabilities are the Mosaic AI Agent Framework and Mosaic AI Gateway, according to Menninger.

The Mosaic AI Agent Framework is a software development kit for developing RAG pipelines that discover relevant data to inform an AI model or application and deliver it to the model for training. Included is Mosaic AI Agent Evaluation, an AI-powered tool that measures the quality of outputs and enables users to give feedback through an intuitive user interface.

Mosaic AI Gateway, meanwhile, is a governance tool that enables users to query, manage and deploy models and applications. Using the feature, customers can easily change the LLMs that power their models and applications as new LLMs emerge and their performance surpasses that of existing LLMs. In addition, administrators can track usage, set rate limits to control spending, and filter for sensitive data such as personally identifiable information to address security and compliance.

"Some of the biggest [generative AI development] challenges continue to be accuracy and governance," Menninger said. "The Mosaic AI Agent Framework will help increase the quality of outputs from generative AI. In addition, the Mosaic AI Gateway provides a governance framework that can span open source and proprietary models."

Kevin Petrie, an analyst at BARC U.S., similarly highlighted the Mosaic AI Agent Framework as an important new feature.

He noted that the emphasis on compound systems is appropriate given that language models are only one component of a broader system for developing generative AI models. Other components such as vector databases and RAG pipelines are just as significant. Therefore, a framework for developing RAG pipelines is a needed feature.

"The Mosaic AI Agent Framework is a critical and necessary step for Databricks," Petrie said. "RAG has become the most logical and cost-effective method of feeding language models domain-specific data and improving model accuracy. The more you can help ... implement RAG, the faster you can move early adopters from experimentation and pilots to production rollouts."

More capabilities

In addition to the Mosaic AI Agent Framework and Mosaic AI Gateway, new Databricks features designed to help customers develop compound systems include the following:

  • Unity Catalog GenAI Tools, a feature that enables customers to govern, share and register tools using the Databricks Unity Catalog so that they can be discovered across the organization in a secure and governed manner.
  • Mosaic AI Model Training, a tool that enables users to fine-tune open source foundation models with an organization's proprietary data so that the domain-specific model can be used to inform decisions specific to that organization in a more cost-efficient way than when using larger models such as ChatGPT and Google Gemini.
  • Making open source the Unity Catalog to provide an open ecosystem for both data and AI governance across clouds, data formats -- including Apache Iceberg and Apache Hudi -- and data platforms.
  • Partnerships and integrations with vendors including Nvidia, Informatica, Precisely and Qlik.

Unity Catalog GenAI Tools is in private preview, while all the other features unveiled on Wednesday -- Mosaic AI Agent Framework, Mosaic AI Agent Evaluation, Mosaic AI Model Training and Mosaic AI Gateway -- are in public preview.

Databricks did not provide a timeline for general availability.

Given that so many capabilities are in preview -- both features introduced by Databricks as well as many unveiled by competitors such as AWS, Google, Microsoft, Oracle and Snowflake -- it's difficult to know which vendors are providing the most advanced environments for AI development, according to Menninger.

"It's great to see these new announcements," he said. "They are certainly steps in the right direction. But enterprises need the reliability and support of generally available capabilities."

Petrie likewise said the new Databricks features will help customers develop generative AI models and applications.

He noted that the vendor's customer base includes teams versed in data science that can take advantage of the capabilities Databricks is planning to provide. As a result, the vendor is on the right track with its product development plans. But for the new tools to truly be effective, they need to be generally available rather than in preview.

"As with Snowflake [during its recent user conference], Databricks is announcing software that remains in public or private preview," Petrie said. "The real test is moving into general availability sooner rather than later."

As for which vendor has so far developed the most functional environment for generative AI development, it's too soon to tell, he continued. Databricks, Snowflake, AWS, Google, Oracle and Microsoft have all revealed significant product development plans, but none has constructed a completed generative AI development environment.

"It's too soon to declare a winner, or even an early leader among vendors in the GenAI arms race," Petrie said. "But because GenAI will ultimately be more of a feature than a standalone initiative, the winners ultimately will be those that can help companies build GenAI into what they already have [such as] applications and existing architectures."

Regarding the impetus for developing features such as the Mosaic AI Agent Framework and Mosaic AI Gateway, customer feedback was a motivator, according to Databricks' Minnick.

Conversations with customers consistently revealed that enterprises were experimenting with generative AI, but struggling to move models and applications into production. Customers also consistently revealed that problems hampering generative AI development centered around quality, cost and privacy.

"As we thought about the last 12 months of our roadmap, we thought about how to help customers get their arms around [quality, cost and privacy]," Minnick said. "In tandem with that was this rise of compound systems."

Next steps

While the new capabilities add greater functionality to Databricks' environment for AI and ML development, the vendor is not finished providing tools aimed at making it easier and more efficient for customers to develop generative AI tools, according to Minnick.

"What we always want to do better is make it easier and easier for customers to build these models with assurance that their data is going to be safe," he said. "We're always focused on how to put more ease of use into the system and make it even faster to build these things than we're making it today."

In particular, model fine-tuning will be an area of greater emphasis for Databricks, Minnick added.

Menninger, meanwhile, said Databricks could do more to enable customers to build traditional AI and ML models and applications in concert with generative AI.

He noted that ISG's research shows that enterprises allocate half of their AI budgets to generative AI and half to traditional AI and ML. Vendors, however, have placed far more emphasis on generative AI over the past 18 months.

"I'd like to see additional investments that help bring the two worlds together," Menninger said. "It's still too hard to create traditional AI and ML models, but I anticipate that the combination of generative AI with AI and ML will help make robust, reliable, predictive analyses more accessible."

Eric Avidon is a senior news writer for TechTarget Editorial and a journalist with more than 25 years of experience. He covers analytics and data management.

Dig Deeper on Data management strategies