Getty Images

Vast Data, Vertica to deliver data lakehouse and analytics

Vast Data combined its disaggregated 'share everything' storage with Vertica's high-performance analytics to create a data lakehouse that answers queries in near real time.

Vast Data combined its all-flash, high-performance storage with Vertica's Eon Mode Architecture to give data warehouse-like responses to data lakes in a converged product.

Available now, the two vendors worked together to create a data lakehouse, a combination of the simplicity and low cost of a a data lake along with the analytical ability of a data warehouse.

Vast's Universal Storage product is the storage foundation of the data lakehouse that brings more performance, better density through compression and a dedicated quality of service, according to the company.

Historically, data lakes have been silos for large amounts of raw data, according to John Mao, global head of business development at Vast Data. Customers tell Vast that data lakes contain beneficial information, but it is very hard to reach. This is contrasted with data warehouses, where information is analyzed much more quickly.

"Data scientists are trying to find a needle in a haystack [in data lakes], finding something interesting that they maybe previously didn't know," Mao said.

Companies can converge data lakes and data warehouses on premises, but they need the right infrastructure to do so, he said.

Not all organizations are moving workloads to the cloud, according to Julia Palmer, an analyst at Gartner. Companies staying on premises require modern infrastructure with better efficiency, performance, density and scale.

"Next-generation workloads will require more scalable platforms for both storage and performance aspects," she said.

Vast Data and Vertica's data lakehouse.
The Vast Data and Vertica architecture for the data lakehouse.

Storage for lakehouses

Vertica builds extremely fast massively parallel processing databases. Vertica's high-performance analytics are geared toward data warehouses where faster performance is required, Mao said.

The problem is that Vertica had used a traditional infrastructure approach, tightly coupling hardware and software together, Mao said. Vertica's Eon Mode Architecture separates compute from storage and enables Vast Data to provide its disaggregated, "share everything" architecture of flash storage in a single tier.

As the concept of data lakehouses gains momentum, Mao said, IT pros need to be able to query the un- and semi-structured data in a data lake faster. This is where Vast Data and its high-performance, dense storage comes in.

"IT leaders who build modern data analytics platforms on prem were forced to compromise and had multiple products for different stages of data processing," Gartner's Palmer said. "Now they are increasingly seeking one single platform that will deliver on simplicity, centralized data management and will be cost-efficient at scale."

Other object storage vendors have similar integrations, she said, but Vast's infrastructure provides disaggregated scalability while using lower cost quad-level-cell flash for capacity and storage-class memory for performance. Vast is designed for large-scale deployments, such as a data lake, while helping to solve the price to performance issue, according to Palmer.

Now [IT leaders] are increasingly seeking one single platform that will deliver on simplicity, centralized data management and will be cost-efficient at scale.
Julia Palmer Analyst, Gartner

Faster and denser

Vast claims that its Universal Storage with Vertica can perform database queries three times as fast as traditional legacy all-flash products.

This performance can take queries in data lakes from four or five days to hours, Mao said. This is through a combination of Vast's all-flash, object storage platform and Vertica's database query engines. The example should be looked at in the context of the volume of data being queried.

"It's a very different philosophy when you're trying to scan for 10 terabytes of data versus 10 petabytes of data," Mao said.

The alliance between the companies allow the data lake to perform more like a data warehouse, he said. The massive data set of the data lake will now have the ad hoc ability to query like a data warehouse.

Increasing performance is top of mind in a data lakehouse, but the massive amount of data needs to be stored. Vast Data's Universal Storage uses a similarity-based compression that combines similar blocks of data in a cluster together, to provide what the vendor said has two times the density.

The alliance between Vast and Vertica isn't the only data lakehouse game in town.

Databricks is a pioneer of the data lakehouse concept. Databricks is 100% in the cloud, where Vast Data is not. Vast's customers aren't moving to the cloud, as some of its customers deploy over 200 PB of data, according to Mao. That would not be an easy migration, and it would be extremely expensive, particularly in egress fees, he said.

"[The cloud] is kind of a little bit of a Hotel California," Mao said. "Once you're there, you're not ever leaving."

Still, the cloud is a good use for data warehouses, Mao said. With Vertica and Vast, a customer can put the data warehouse in the cloud and use Vast to keep the data lake on premises. Vertica can query either in the same manner.

Dig Deeper on Flash memory and storage