Sergey Galushko - Fotolia

Upsolver advances open cloud data lake, data pipeline efforts

Upsolver enhanced its data preparation platform to transform data lake content into a data lakehouse structure that enables data queries and analysis.

Getting data into a cloud data lake is one thing, being able to bring that data into the right structure so that it can be useful for data queries and analytics is a different and more difficult undertaking.

Data preparation vendor Upsolver is looking to make it easier for organizations to get data into cloud data lakes in an approach that helps to enable what the company refers to as an open cloud data lakehouse.

The concept of cloud data lakehouse was first espoused by Databricks as an approach that joins capabilities from data warehousing and data lakes. With a cloud lakehouse, data is stored in a data lake in a format that can allow users to query the data.

With Upsolver, the vendor's platform enables users to load data into a data lake and then transform in an extract, transform, load (ETL) type process so that it's usable by query engines. On Oct. 7, Upsolver made generally available new data ingestion connectors that enable the company's platforms to work with Amazon Redshift and Snowflake. Upsolver had previously largely focused on support for the Amazon Athena query engine.

Creating a data pipeline to enable a cloud data lakehouse

Among the vendor's customers is business intelligence and analytics vendor Sisense, which has been using Upsolver to build a data pipeline to make sense of its own data that is stored in Amazon S3.

We understood that we have to clean, transform and prepare the data before we start doing stuff with it.
Guy BoyanguCTO and co-founder, Sisense

"We understood that we have to clean, transform and prepare the data before we start doing stuff with it. Of course, for BI and analytics we are using Sisense, but we needed the data pipeline components," said Guy Boyangu, CTO and co-founder of Sisense.

Before engaging with Upsolver, it would have required a dedicated team of data engineers at Sisense to build out the data pipelines the company needed, Boyangu said. He noted that Upsolver doesn't have any specific business partnership with Upsolver for a customer-facing offering that Sisense would offer to its end users.

Sisense uses a single data lake where the vendor cleanses, transforms and structures data and takes it to outlets such as Athena, Snowflake or its own in-memory database, Boyangu said.

"We use Upsolver for cloud data pipeline management," he said.

Upsolver data dashboard
Upsolver's data dashboard provides visibility into data transformation activities.

The Upsolver approach to cloud data lakehouse

The basic idea behind his company's technology is to prepare data for the cloud, said Ori Rafael, CEO and co-founder of Upsolver.

With traditional databases and data warehouses, an ETL is used to directly load data. With Upsolver, data that is in a data lake can be prepared so that a query engine can be used to directly utilize the data for business intelligence or analytics.

When it was founded in 2018, Upsolver had connectors for Amazon Athena users. Since then, customers have expressed demand to expand to Redshift and Snowflake, Rafael said. Because many organizations now use multiple query and analytics engines, the need for a cloud data lakehouse approach has arisen, Rafael said.

"With a lakehouse you're getting the cost advantages of a data lake, but you're managing to use the engines you're already using today, providing easy access," Rafael said. "A lakehouse is the data lake without all the limitations and the difficulty to access the data."

Upsolver currently only works with AWS cloud environments, but the vendor plans to expand support for other clouds, Rafael said.

Next Steps

Prefect raises $32M for dataflow automation technology

Dig Deeper on Data management strategies