Sergey Galushko - Fotolia

Databricks builds out SQL Analytics for data lakehouse

Databricks is building out its lakehouse platform with a new SQL Analytics service that will make it easier to run SQL queries with better visualization in the cloud.

Databricks is putting more substance behind its data lakehouse model, with a new SQL Analytics service, revealed Nov. 12, that is part of the company's Unified Data Analytics Platform.

The data lakehouse is a concept that the data science and engineering vendor has been advocating over the course of 2020 as a technical architecture that combines the best elements of data lake and data warehouse models.

The technology foundation for Databricks' vision of the lakehouse is an open source project known as Delta Lake, which is currently hosted by The Linux Foundation. In June, Databricks expanded on Delta Lake with the launch of its Delta Engine, which adds Spark 3.0-based data queries and caching to the lakehouse.

The Databricks SQL Analytics service brings Delta Engine into the Databricks platform to help customers use the lakehouse model. The new service also integrates technologies from data visualization vendor Redash, which Databricks acquired in June.

While Databricks unveiled the SQL Analytics service today, it will be available only as a preview starting Nov. 18. The vendor said it expects general availability to follow in early 2021.

Why the data lakehouse concept works

The lakehouse concept that is at the core of the Databricks service makes good sense to Hyoun Park, CEO and principal analyst at Amalgam Insights.

Park said the lakehouse that Databricks advocates is fundamentally about the idea that data lakes, collections of data sources across a variety of data formats, need to be both governed and analytically available, for lakehouse users to make sound data-based decisions.

"The data warehouse has been an extremely powerful tool for unlocking analytics but is becoming slightly outdated in an era that data is everywhere, being created all the time, and stored in a wide variety of formats," Park said. "In this context, the idea of a lakehouse as a data lake that performs data warehouse-like purposes is an important step forward for the analytics community."

Screenshot of Databricks data visualization
Databricks' new service integrates data visualization capabilities to help users make more use of data.

How Databricks SQL Analytics advances the lakehouse

In Park's view, the new Databricks SQL Analytics service is significant because it bridges gaps for data analysts and data scientists who need to bring semistructured data into both analytic and data science efforts quickly, with the governance and performance needed for production environments.

... The idea of a lakehouse as a data lake that performs data warehouse-like purposes is an important step forward for the analytics community.
Hyoun ParkCEO and principal analyst, Amalgam Insights

"With this service, Databricks increases the availability of data lakes for analytic usage and helps unlock the insights and guidance currently hidden in semistructured data sources that have traditionally been difficult to both connect and analyze in context of traditional structured data sources," Park said.

Arsalan Tavakoli, senior vice president of field engineering at Databricks, explained that the SQL Analytics service enhances the SQL query capabilities that Databricks has long provided with its platform.

Tavakoli noted that there is now better support for business intelligence tools with connectors to help users access lakehouse data. There is also improved query performance from a concurrency perspective, such that more users can query a given data source at the same time.

Redash integration provides lakehouse visibility

Databricks SQL Analytics marks the official debut of Redash technology into the Databricks technology stack. Redash is an open source technology that provides users with the ability to query a data set with a SQL interface.

Tavakoli said that since the acquisition, Databricks has been working on security and performance improvements for production workloads. He added that with the Redash integration, Databricks users can now choose to get a data science view or a SQL Analytics view of data within the same environment.

"It's not a standalone product now. Redash has now been fully integrated within Databricks," Tavakoli said. "So, we're pulling in all the Redash capabilities, but natively integrated with versioning, security and collaboration and deeply integrating it with all of the underlying infrastructure."

Next Steps

Databricks steps up in competitive machine learning market

LA County modernizes hiring with Azure and Databricks

Databricks platform fuels analytics at State Department

Dig Deeper on Data management strategies