Getty Images/iStockphoto

New Databricks tools tackle lingering GenAI accuracy issues

Features such as centralized model governance and real-time monitoring aim to improve the accuracy of outputs so enterprises can confidently scale beyond experimentation.

Databricks on Monday unveiled new features, including governance and monitoring capabilities, designed to enable customers to scale generative AI beyond pilot projects and low-risk applications.

While studies show widespread interest among enterprises in developing and deploying generative AI tools, they also show that concerns regarding the accuracy of generative AI outputs and data security prevent many from putting such tools into production.

For example, a recent survey by Deloitte found that only 30% of respondents expect current generative AI experiments to be fully scaled by the midpoint of the year, with mistakes leading to real-world consequences cited as the top barrier.

To alleviate such concerns, Databricks introduced centralized governance for all AI models through the AI governance framework Mosaic AI Gateway and real-time performance observability with Lakehouse Monitoring for Agents, among other features.

The tools -- now in various stages of testing and preview -- do address concerns related to generative AI accuracy, according to Andy Thurai, an analyst at Constellation Research. However, whether enterprises want to use Databricks for governance, observability and other features that address accuracy remains to be seen.

"[The] features address the enterprise adoption of GenAI, for sure," Thurai said. "But … almost every platform provider, every hyperscaler, every model provider and many startups are working to provide similar solutions. While Databricks has a leg up with its data platform, I'm not sure they will fully convert those with enterprise AI needs to their platform."

Based in San Francisco, Databricks is a data platform vendor that helped pioneer the data lakehouse format for data management. Over the past two years, the vendor has expanded to prioritize AI development. Competitors include archrival Snowflake and tech giants such as AWS, Google Cloud and Microsoft.

New capabilities

Given generative AI's potential to make workers better informed and more efficient, many enterprises have increased their investments in AI development since OpenAI's November 2022 launch of ChatGPT marked a significant improvement in generative AI capabilities.

However, the accuracy of generative AI outputs has been a concern since the recent surge of interest. Even with the requisite volume of high-quality data, AI hallucinations can still occur.

In addition, because generative AI models need to be combined with an organization's proprietary data to understand that organization's operations, data security is a concern.

Therefore, despite enterprises wanting widespread use of generative AI to better inform workers and automate some processes, many only use generative AI on a small scale for internal applications, according to Stephen Catanzano, an analyst at Enterprise Strategy Group, now part of Omdia.

For example, enterprises are deploying generative AI chatbots that assist employees but are hesitant to deploy agents that can autonomously take on certain tasks.

"Enterprises are primarily using GenAI for low-risk internal use cases due to concerns over accuracy, governance and security," he said. "The fear of financial and reputational risks, along with challenges in integrating GenAI with enterprise data, is holding them back from major initiatives."

Databricks' new capabilities include the following features:

  • Custom LLM provider support in Mosaic AI Gateway, now in public preview, so customers can govern all their AI models in a central location, including open source and proprietary SaaS models.
  • Lakehouse Monitoring for Agents, a feature in beta testing that deploys MLflow Tracing and LLM judges so users can track the performance of AI agents.
  • An API in public preview that enables developers to integrate Genie, a conversational interface that lets users interact with data using natural language, into custom-built applications and productivity platforms.
  • Batch inferencing capabilities with Mosaic AI Model Serving to simplify infrastructures needed to integrate unstructured data and train models.

Each feature is purposeful, with centralized AI governance potentially the most significant, according to Thurai.

"Centralized governance for all AI models is an interesting solution," he said. "Integrating and managing both open source and proprietary SaaS models in one place and the ability to set governance policies centrally can be compelling for large enterprises which have distributed units that work independently on AI models and consumption."

In addition, real-time monitoring could spur more enterprise adoption of generative AI, Thurai added.

Catanzano noted that the new capabilities address some -- though not all -- concerns preventing enterprises from more widespread generative AI use. Like Thurai, he highlighted Lakehouse Monitoring for Agents to address accuracy and the Mosaic AI Gateway capabilities unifying governance for AI models.

"These directly address key enterprise concerns around control, reliability and compliance," Catanzano said.

Regarding the impetus for the new features, customer feedback played a prominent role, according to Craig Wiley, Databricks' senior director of product.

"Our customers are excited by AI agents' potential but scaling them effectively while ensuring quality and governance remains a challenge," he said. "Even the most advanced GenAI models struggle to deliver business-specific, accurate, and well-governed outputs, largely because they lack awareness of relevant enterprise data."

Plans

While the new capabilities address customer concerns related to accuracy and performance, there are other ways Databricks can improve its AI development and management capabilities, according to Catanzano.

For example, model fine-tuning is one area the vendor could improve. Adding industry-specific AI applications, boosting explainability and bias detection, and developing automated AI agent lifecycle management are others.

"These advancements would further build enterprise confidence in deploying AI for mission-critical applications," Catanzano said.

Databricks' actual plans include additional features that simplify developing, governing, deploying and evaluating AI agents, according to Wiley. Among them are tools to develop the agents suggested by Catanzano that solve domain-specific problems.

Thurai, meanwhile, noted that while Databricks has a strong AI development suite relative to its competition, it's difficult to navigate. In addition, while it has consistently added capabilities over the past two years, it could provide more integrations that enable customers to customize their AI development stack.

"Things like usability and user experience have been a major customer complaint," Thurai said. "And integration with more third-party tools such as vector databases, better optimization of AI workloads, fast and efficient analytics for hybrid environments and cost efficiency are areas they could improve."

Eric Avidon is a senior news writer for Informa TechTarget and a journalist with more than 25 years of experience. He covers analytics and data management.

Dig Deeper on Data management strategies