Getty Images

A look at Presto, Trino SQL query engines

The co-creator of the open source project at Facebook reflects on 10 years of growth as he helps lead one of its resulting tools into the future.

On Aug. 8, 2012, a group of engineers at Facebook started the Presto project, introducing a new SQL query engine to help the social media giant to scale.

After a decade of growth, the technology is more relevant than ever before, providing an open source approach that enables organizations to easily query data wherever it might reside. But it hasn't been a straight line of success for the Presto project, which has experienced both drama and growth over the last decade.

In 2018, after the original founders of the Presto project left Facebook, the technology was divided into two separate projects: PrestoDB and PrestoSQL. The division led to two rival software foundations and, in January 2021, to the rebranding of PrestoSQL as Trino.

Multiple commercial vendors of the technology have also emerged over the past decade, including Ahana for PrestoDB and Starburst for Trino.

The pioneering impact of Presto

The effect Presto has had on the data community over the past decade is not lost on industry analysts.

"Trino and Presto helped drive the rise of the query engine, which helps enterprises maintain fast data access even as their environments grow more complicated," said Kevin Petrie, analyst at Eckerson Group. "The query engine uses familiar SQL commands to retrieve data from data stores at low latency and high throughput."

Trino and Presto helped drive the rise of the query engine, which helps enterprises maintain fast data access even as their environments grow more complicated.
Kevin PetrieAnalyst, Eckerson Group

Data stores include SQL databases, NoSQL databases, object stores and file systems, according to Petrie. He added that the Presto and Trino query engines also enable enterprises to support business intelligence and other analytics projects on high volumes of data in environments like data lakes, in particular.

Hyoun Park, CEO and analyst at Amalgam Insights, said that in his view, Presto represented the first scale-out serverless analytics for distributed data when it was introduced. At the time, it opened up the concept of analytic data from the traditional single source of truth to a more open environment for interactive SQL-based querying on a wide variety of data sources.

"The ability to do analytics on the data as a concept owes a great deal of gratitude to PrestoSQL and Trino for both popularizing and demonstrating the concept," Park said.

Built to last

From its earliest days, a key goal of the Presto project was to provide a foundational technology that would last a decade or more, according to Dain Sundstrom, co-creator of Presto and Trino, and now CTO at Starburst.

"I actually very clearly remember the conversation we had when we were starting this project," Sundstrom said. "When we were starting working on Presto, we were all like, 'Let's try and make this like PostgreSQL,' which is a database we all really love."

They liked PostgreSQL for its open source community and its longevity as a database, he said.

The team that built Presto was familiar with existing analytics databases, including Teradata and Netezza. But according to Sundstrom, a decade ago there were no good options for open source analytics databases. Analytics databases didn't work well -- if at all -- with Hadoop and cloud object storage.

So, when Facebook attempted to work with large amounts of data, it didn't have a good option, according to Sundstrom. "We could have used Hive, but it's really slow, and we realized we really needed to build something," he said.

The future of Trino

Since 2018, Trino and Presto have been separate projects, each with its own direction.

A key focus for Trino over the last several years has been performance, which has benefited large users including Robinhood and DoorDash, which each detailed their use of the project at the 2021 Trino Summit. Fault tolerance and resilience have also been key areas of focus for the Trino project, enabling SQL queries to scale in a more reliable approach.

Among the new capabilities being developed in the Trino community are polymorphic tables. Sundstrom explained that polymorphic tables provide users with a SQL standard way of embedding complex execution capabilities into the middle of a query.

"Polymorphic tables provide new and interesting ways to connect into non-SQL data sources," he said.

After 10 years, Sundstrom is satisfied that the project he helped to create is continuing to be impactful and to benefit from the contributions of others in the open source community.

"I always want to see more people involved, and I want the community to be diverse, and I think we're doing a really good job," he said. "We just got large contributions from Bloomberg, LinkedIn and several other companies that use Trino at scale internally."

Dig Deeper on Database management