Konstantin Emelyanov - Fotolia

Quobyte CEO says customers need a scale-out mindset

In order to tackle AI, machine learning and other demanding workloads, enterprises need to break out of their appliance-centric approach to storage, according to Quobyte's CEO.

Software-defined storage startup Quobyte CEO Bjorn Kolbeck said most of the industry is still stuck in the past.

The business- and revenue-driving workloads in enterprises are demanding more and more compute and storage, and traditional, appliance-based storage infrastructure can't handle that, according to Kolbeck. Workloads such as machine learning and deep analytics require multiple cores and petabytes of data but concentrating these high-demand workloads on single nodes leads to higher hardware costs and performance bottlenecks. The solution instead is to use scale-out architecture and to separate storage from its underlying hardware.

The technology for organizations to build this sort of infrastructure exists but getting administrators to break from tradition is the biggest hurdle, Kolbeck said. Enterprises are still used to a "one app, one appliance" mentality.

Quobyte's distributed parallel file system can write to multiple storage servers and is billed as a drop-in replacement for NFS. It can be deployed on any x86 server, as well as public clouds including AWS, Microsoft Azure, Google Cloud Platform and Oracle Cloud. It can also run on Kubernetes containers. The product is similar to Seagate's open source Lustre, IBM's General Parallel File System and Panasas' PanFS.

In this Q&A, Kolbeck discusses what's driving customers' storage needs today and why appliances can't help them meet those requirements.

What are customers' data storage challenges today?

Headshot of Bjorn KolbeckBjorn Kolbeck

Bjorn Kolbeck: Today we see problems where suddenly, HPC [high-performance computing] has come to the enterprise. Enterprises now want to run large-scale machine learning and analytics -- 10 petabytes (PB) is often considered small. But the IT department is still stuck with the technology that they have used since the early 2000s, when we switched to virtualization and the enterprise storage associated with that. They're looking at those tools and they don't understand that this cannot serve the application.

These customers bring in more and more of these monolithic appliances and then bring in a lot of professional services to make it work somehow, instead of understanding that the problems have shifted to scale out, that storage now needs to change, too. They're banging their heads against the wall trying to solve new problems with 20-year-old technology.

That's, I think, where software-defined storage and scale-out storage come in.

What are the constraints of traditional storage architecture?

Kolbeck: The first problem is NFS [Network File System]. It's the protocol most people use to access their storage architecture.

NFS was developed 35 years ago. Back then, we had very, very different problems. NFS was designed for workstations to act as a single storage server, so the protocol itself is many-to-one. It doesn't do failover, it doesn't do multiple connections to multiple servers, and we're still stuck with that.

And even if you look at storage companies that started recently, a lot of them rely on NFS when your application requires scale out. So, if you run 100 jobs, 100 GPUs, one to access your data in parallel to run the jobs, and you go through NFS, then you have artificial bottlenecks. And the only way to get rid of that is to fix the protocol.

The second problem is scale-out. Companies are going away from linear scaling and monolithic applications like SQL databases because it's cheaper and easier to scale out and get massive performance than trying to squeeze the maximum performance from a single node.

The idea of scale-out is to run massive workloads that won't fit into a single machine -- doing the same thing for storage is a huge advantage. First of all, the storage then can scale together with the application and serve many servers in parallel. But also on the cost side, with cheaper standard servers for the storage system instead of trying to squeeze a million apps on a single server, I can build a massively more cost-effective storage system and still scale it out to the necessary hundreds or thousands of storage nodes.

They're banging their heads against the wall trying to solve new problems with 20-year-old technology.
Bjorn KolbeckCEO, Quobyte

Software-defined storage and scale out storage technology exist, so why is data growth still a problem?

Kolbeck: The dominant storage systems are still very appliance-based. Switching from this very hardware-centric view to software storage that's just an application is a huge leap for admins to make or to understand. It's like the admins that, back when VMware was new, thought virtualization was too dangerous. They said, 'I want everything on my bare-metal machines, and I have one machine per application.'

It's the same thing now. We have admins who say storage needs to be an appliance, and everything else is too complicated. They don't understand that the applications change. And by being too conservative, doing the same thing again, they're part of the problem.

Now, the application users -- the data scientists, the developers -- they have already moved to scale-out. They use scale-out NoSQL databases, they run Hadoop clusters, they run distributed machine learning. They understand scale-out well, and if IT departments don't deliver, they are seen as a problem and an inhibitor.

How many customers do you have now, and what industries are they in?

Kolbeck: We have north of 50 customers. Our product as a file system could work anywhere, but we focus on a few verticals. As a startup, that's necessary for us.

There's finance, which needs fraud detection and has the typical problems that come with having a lot of data. Autonomous driving is another area. Right now, it's a lot more like assisted driving systems rather than futuristic, truly autonomous driving, but we see huge projects with large software teams.

For life science, the genome sequencers and microscopes are creating a lot of data, and that has created pressure on them to find new solutions on the storage side. We're also in the entertainment market, focusing on streaming, where there's a ton of very valuable data.

The final segment is traditional HPC, especially at universities. They have large clusters and high visibility, and during [COVID-19], they also got significant additional funding.

Who are your competitors?

Kolbeck: The clear No. 1 is EMC Isilon, and we see newer competitors only occasionally.

To some degree, we compete against NetApp and Pure, but they only have monolithic appliances. I would say customers who try to use them for scale-out workloads might run into a situation where we compete against them, but we're not going after their single final business.

Dig Deeper on Primary storage devices