Getty Images/iStockphoto

Dremio opens up data lakehouse with new engine

The data lakehouse vendor is expanding its cloud platform with a new SQL query engine and data metastore for data lakes that builds on top of the Apache Iceberg table format.

Dremio, at its Subsurface virtual conference on March 2, made its Sonar query engine generally available and released a preview of the new Arctic metadata management service for its data lakehouse cloud platform.

The data lakehouse vendor, based in Santa Clara, Calif., has been building out its platform in recent years -- merging the capabilities of data warehouses and data lakes.

The new Dremio Sonar query engine is built on top of the open source Apache Iceberg technology, which provides data table services for data lakes. 

Sonar supports the SQL Data Manipulation Language (DML) that enables users to insert, update and delete information directly in a data lake. The other new feature  is Dremio's Arctic metastore for data, which aims to replace Apache Hive technology.

"The Lakehouse concept, the idea that organizations will be able to consolidate multiple workloads onto a single data platform, is certainly gaining advocates and vendor support," said Constellation Research analyst Doug Henschen.

"The promise is consolidation of platforms and reduced cost, but organizations will have to make sure that a single platform meets their BI [business intelligence], analytics, data science and engineering needs," he continued.

Screenshot of Tomer Shiran, Dremio co-founder and chief product officer, at the vendor's virtual conference.
Dremio's co-founder and chief product officer Tomer Shiran released a preview of the vendor's new Arctic data metastore technology that is intended to replace Apache Hive.

Building out the data lakehouse to replace data warehouses

Henschen said he sees the new functionality that Dremio unveiled on Wednesday as aimed at BI and analytics professionals.

For example, he noted that Dremio is enhancing its platform with added update and delete capabilities with DML that fill out the full record-level manipulation ability that data professionals expect from a data warehouse platform.

The Lakehouse concept, the idea that organizations will be able to consolidate multiple workloads onto a single data platform, is certainly gaining advocates and vendor support.
Doug HenschenAnalyst, Constellation Research

In the opening keynote for the Subsurface event, Dremio's co-founder and chief product officer, Tomer Shiran, fleshed out the data lakehouse concept.

With the data lakehouse, rather than bringing data into a query engine, users bring the query engines to the data, Shiran said. So data stored in cloud object storage such as Amazon S3 can be queried by any number of different technologies and users don't have to move data into a data warehouse to use it.

Dremio Sonar provides new data lakehouse query engine

The new Dremio Sonar query engine is powered by the open source Apache Arrow technology.

Among the features that Sonar enables are data queries across any type of data. Shiran said queries can be run against a data metastore, like Apache Hive, directly against data in a data lake or even against a relational database. 

Sonar also supports DML queries that enable users to insert, update and delete records in data lakes. The DML capability uses the open source Apache Iceberg technology for data lake tables and the Apache Parquet data format.

"Apache Iceberg is a table format that is built on top of Parquet, so you can start thinking of your data not as files but as tables," Shiran said.

Dremio Arctic enables a data lake metastore

Shiran, in his keynote, also publicly previewed Dremio Arctic, which he described as an intelligent metastore for Apache Iceberg.

Shiran explained that Arctic will work with other data lake query engines, including Apache Spark, Trino and Presto -- not only Dremio Sonar. Dremio's goal is to create a modern metastore for data lakehouse deployments.

"For a very long time, the only kind of metadata management capability in the lake was the Hive metastore, which is one of the last remaining pieces of the original Hadoop stack," Shiran said. "We thought it was the right time and it is actually necessary to provide something a lot more sophisticated, much more capable than what Hive metastore can provide."

Next Steps

Data mesh helping fuel Sloan Kettering's cancer research

Dig Deeper on Data management strategies