Getty Images

New AI-powered tools from Dremio target data discovery

With relevant data key to developing accurate AI and analytics applications, the data lakehouse specialist's latest features target making data faster and easier to find.

Dremio on Tuesday launched new features including query optimization capabilities and AI-powered semantic search aimed at enabling customers to more easily and efficiently discover the data needed to inform analytics and AI applications.

In addition, the vendor added automated data clustering for the Apache Iceberg open table storage format and released its Apache Polaris catalog for data governance.

Dremio unveiled its new features during Iceberg Summit 2025, a conference for Iceberg users being held in person on April 8 in San Francisco with a virtual component on April 9.

Because Dremio's update addresses discovering the relevant data that trains AI applications such as assistants and agents aimed at enabling widespread use of analytics, it is a significant addition for the vendor's users, according to William McKnight, president of McKnight Consulting Group.

"I find it pretty compelling in terms of making analytics more accessible, [which is] a big ask from enterprises today," he said. "Dremio's release introduces several key enhancements aimed at improving data discovery and accelerating analytics and AI initiatives [that] collectively contribute to a more efficient and effective data and analytics framework."

Based in Santa Clara, Calif., Dremio is a data lakehouse specialist whose platform is built for storing Iceberg tables. Because lakehouses enable users to combine structured and unstructured data -- and AI tools require large volumes of high-quality data for accuracy -- they are one of the preferred storage formats for AI development pipelines.

New capabilities

Enterprises are rapidly increasing their investments in generative AI, given its potential to make workers better informed and more efficient.

To understand the unique characteristics of individual businesses and address the various needs of employees doing different jobs within the enterprise, generative AI applications need to be trained using the enterprise's relevant proprietary data.

Data discovery, therefore, is a critical part of AI development. And because the relevant data needed to train a model or application geared toward a specific task can be difficult to find amid millions of rows of data, tools that make data easier to discover serve as accelerators for AI developers and engineers.

Dremio's new capabilities were developed based on a combination of customer feedback and the vendor's recognition of barriers making it difficult for enterprises to discover relevant data, according to Tomer Shiran, its founder and chief product officer.

"Companies today need a platform that provides access to any data source, automatically, with high performance, and the ability to discover data and understand its meaning," he said.

Query optimization capabilities called Autonomous Reflections aid data discovery by ensuring that queries are run on live, up-to-date data with no changes to their SQL structure, which could affect data quality. Each Autonomous Reflection is an automatically generated monitor that eliminates the need to manually tune queries for AI and analytics workloads.

Iceberg Clustering automates data layout optimization, which eliminates the need for humans to partition tables -- or organize similar rows of data into files -- to speed queries and lower costs associated with manual tuning.

AI-enabled semantic search aims to reduce data discovery time from days to seconds by enabling both humans and autonomous AI agents to search for existing data assets using an organization's standardized business terminology.

Lastly, Polaris Catalog, Dremio's open data catalog built on the Apache Polaris catalog for Iceberg tables, provides users with fine-grained access controls for security, data lineage to address data quality and governance capabilities to ensure proper organizational use of data.

"Each feature addresses a specific pain point -- reducing manual performance tuning, simplifying data discovery, enhancing governance and optimizing data layouts -- all working together to transform data workflows from months to minutes," Shiran said.

Dremio's release introduces several key enhancements aimed at improving data discovery and accelerating analytics and AI initiatives [that] collectively contribute to a more efficient and effective data and analytics framework.
William McKnightPresident, McKnight Consulting Group

Perhaps the most significant new features are Autonomous Reflections and AI-enabled semantic search, according to McKnight.

Autonomous Reflections aim to save significant time and expense by eliminating manual tuning, while AI-enabled semantic search aims to do the same by cutting data discovery time. In addition, by allowing users to search for data using familiar terminology, AI-enabled semantic search has the potential to enable more nontechnical employees to work with data.

"These two features tackle major pain points in data management and AI development today: performance and discovery," McKnight said.

From a competitive standpoint, Dremio continues to evolve, he continued. The vendor's peers include Databricks, one of the pioneers of the lakehouse format, and Cloudera, as well as tech giants such as AWS that offer lakehouses.

"Dremio [is] aiming to be at the forefront of the evolving data lakehouse market, particularly in its focus on open table formats and integrating intelligence to accelerate AI and analytics initiatives," McKnight said. "It will need to focus on integrating its platform with a wider range of ecosystems, tools and services to further strengthen this position."

Next steps

As Dremio plots its roadmap, its goal is to develop an intelligent lakehouse platform that serves the needs of users in the burgeoning AI era, Shiran said.

To do so, the vendor plans to continue building an architecture that serves not only humans with features that make them more efficient, but also AI agents that act autonomously and need to be properly trained and governed. Specific initiatives include improving enterprise-grade governance and delivering capabilities to users where they're needed, whether in public clouds, private clouds or on-premises, Shiran continued.

"Within this vision, we're expanding autonomous capabilities that eliminate manual intervention, deepening semantic understanding across enterprise data and continuing our commitment to open standards," he said. "These aren't separate initiatives, but interconnected aspects of our intelligent lakehouse approach."

McKnight, meanwhile, suggested that Dremio could grow by adding partnerships and integrations with more data sources and data management systems. In addition, integrating more AI throughout its own platform could benefit users.

"Developing strategic partnerships with key players in the data management and AI spaces could enhance the platform's capabilities and reach," McKnight said. "Integrating emerging AI technologies like natural language processing and deep learning could also enhance the platform's capabilities."

Eric Avidon is a senior news writer for Informa TechTarget and a journalist with more than 25 years of experience. He covers analytics and data management.

Dig Deeper on Data management strategies