michelangelus - Fotolia

Starburst advances Presto to handle Hadoop data better

Enterprise Presto SQL vendor Starburst updated its data query platform with expanded support for legacy Hadoop workloads as well as modern cloud data lake deployments.

Starburst's mission is to help organizations with data stored in Hadoop-based deployments to access and query that data quickly, using the open source Presto SQL query technology.

The data access and analytics vendor said on Wednesday that it updated the Starburst Enterprise Presto platform, which is based on the open source Presto distributed SQL project originally developed by Facebook.

The market for Presto-based technologies is growing, with other providers in the industry, notably including Ahana, which launched its Presto services on June 30. Two different active open source development efforts are behind Presto with PrestoSQL and PrestoDB. The plan, according to Starburst, is to bring the two communities together under one umbrella at the Linux Foundation's Presto Foundation in the near future. Starburst is based on the PrestoSQL project, while Ahana is derived from PrestoDB.

Presto itself is finding favor with organizations looking to continue to use Hadoop big data deployments as well as data lakes. While many enterprises are shifting their long-term attention away from on-premises Hadoop deployments to cloud-based data lakes based on object storage, a large number of existing big data processing deployments are still on premises, said Matt Aslett, research director at S&P Global Market Intelligence.

Presto can be used to accelerate distributed data processing projects based on both Hadoop and object storage, whether they reside on-premises or in the cloud.
Matt AslettResearch director, S&P Global Market Intelligence

"Presto can be used to accelerate distributed data processing projects based on both Hadoop and object storage, whether they reside on-premises or in the cloud," Aslett said. "As such, it provides a consistent compute layer that can support the ongoing use of existing investments and provide a migration path to facilitate the use of new cloud platforms."

Starburst Enterprise Presto adds new features

Matt Fuller, co-founder of Starburst, said Presto enables users to query data from a variety of data sources, including Hadoop and the cloud, as well as from relational and non-relational database systems. With Starburst, an organization that has an investment in Hadoop can still continue to use their data, with the Presto-based query approach, Fuller noted.

"With this release, what we're announcing is really just better and more advanced integration with Hadoop," Fuller said. 

Among the specific enhanced integrations with Hadoop in the new Starburst update is support for Cloudera CDP 7.1, which is a Hadoop data platform. Also, Starburst added support for the MapR Hadoop platform, which was acquired by Hewlett Packard Enterprise in August 2019.

Screenshot of Starburst Enterprise Presto data connector system
Starburst Enterprise Presto has a data connector system to query data located in multiple types of data sources.

Presto is helpful for querying cloud data lakes

One of the key use cases for Presto is with cloud data lakes, such as Amazon S3, which are compatible with the Hadoop Distributed File System (HDFS). Starburst has a connector model for different data sources, including data lakes on AWS, Azure and Google.

"Presto is a really good tool to query from cloud data lakes," Fuller said. "That's actually what makes it really nice for companies that are transitioning because they can use the same tool today with Hadoop and not have to use a different tool tomorrow as they as they transition to a data lake."

It's increasingly common for organizations to use more than one cloud, which is another area in which Presto is useful. Presto can run on any cloud and Starburst is seeing Presto used to help enable multi-cloud data lake queries as well, Fuller pointed out.

Improving Presto security with Apache Ranger

One of the key open source technologies that used to help secure Hadoop is the Apache Ranger framework for data security.

While Ranger started out as a framework focused on Hadoop, Fuller noted that in recent years it has been more broadly deployed outside of the Hadoop ecosystem to secure data.

"You can think of Apache Ranger as a global place to store all your security policies for your data lake and other data sources," Fuller said.

Starburst already had some integration with Ranger and is now enhancing it with additional capabilities. One of the new features is support for a feature called SQL authorization. With SQL authorization support, users can grant and revoke access to particular tables in Starburst Presto and then that configuration will be reflected in Apache Ranger.

Fuller said Starburst will continue to work on improving access to different types of data sources with Presto, as well as making the overall platform easier to use.

"You can expect to see more connectivity and a lot more performance for federated access to data," he said.

Next Steps

Starburst and Varada partner for Trino data platform effort

Ahana raises $20M for Presto cloud data lake query tech

A look at Presto, Trino SQL query engines

Dig Deeper on Data warehousing