Getty Images

Voltron Data takes aim at data management with Apache Arrow

CEO and co-founder Josh Patterson outlines the promise of the new language-agnostic data analytics development framework and why the vendor is raising big money to support it.

Enabling cloud data lakes to be queried quickly typically requires a series of different technologies.

Among the open source technologies that are often part of a cloud data lake analytics stack is the open source Apache Arrow project that helps accelerate data queries with an in-memory format for data analytics.

Arrow is used by multiple vendors today, including Dremio, Snowflake and Databricks.

Now another player is entering the fast-growing niche.

San Francisco-based startup Voltron Data said on Feb.17 it has raised $110 million in funding in a bid to further extend the Apache Arrow technology and provide enterprise support capabilities.

Voltron Data's founders include Wes McKinney, who helped create Apache Arrow; Rodrigo Aramburu, a co-creator of the BlazingSQL distributed SQL query engine; and Josh Patterson, whose experience includes helping create the RAPIDS open source GPU accelerated data science project at gaming and AI hardware and software giant Nvidia.

In this Q&A, Patterson, now CEO of Voltron Data, details the challenges and opportunities for building out a new ecosystem around the Apache Arrow project.

Why did you start Voltron Data as a platform on top of Apache Arrow?

Josh PattersonJosh Patterson

Josh Patterson: My co-founders and I realized from our past experiences that if you want to build new data science software, it should be built on Apache Arrow, because it's a modular composable standard.

We thought it was almost natural to align different parts of the data ecosystem with even more open source projects and people who contribute to what we think are amazing open source standards and put them under one roof.

We have our first product announcement coming in a few weeks so I can't say too much, but our first products are around bringing open source libraries to market better, to make them easier to use and reducing barriers to adoption.

There are a lot of projects that are kind of unsung heroes and backbones of the data analytics ecosystem. We will make sure that they are getting the attention that they need and making sure that they have all the support needed to do that.

What makes Apache Arrow so good and how are you looking to help improve it with Voltron Data?

There are a lot of projects that are kind of unsung heroes and backbones of the data analytics ecosystem. We will make sure that they are getting the attention that they need and making sure that they have all the support needed to do that.
Josh PattersonCo-founder and CEO, Voltron Data

Patterson: Apache Arrow is already a great technology. I think one thing that people sometimes don't know or miss is how wide and deep the project is. For example, there's Skyhook, which is a group out of University of California Santa Cruz working on how to use Ceph with Apache Arrow.

There's also work in the geospatial space with Arrow and setting up common standards to represent geospatial data. There are all these different language bindings for Arrow, including Go, Rust, Ruby, JavaScript and Java. There are also accelerators for using Arrow with things like RAPIDS for Nvidia GPUs.

There are so many things going on in the Arrow ecosystem, but sometimes there's just a little barrier; there's a little bit of friction. And if we can remove that friction, if we can make it a little bit more polished, a bit more user-friendly, then adoption can grow even more and that's what Voltron Data is really here to do.

What tend to be the biggest challenges or barriers to adoption for Apache Arrow in data platforms today?

Patterson: Typically, wherever you see many data silos and many different languages in use, that is where you see Apache Arrow coming into play today.

The challenges of adoption are primarily around legacy technology. Sometimes, it's hard to modernize systems when they are basically in some form of maintenance mode. Those places are sometimes the most resistant to change.

So if there is a monolithic system in place with a single language and it is performing fine, it's hard to get people to modernize a system if it's not broken. We typically see the need for Arrow on the systems that are stressed, overworked and burdened. Those systems need increasing efficiencies and they need a bridge to new technology.

Dig Deeper on Database management