Kit Wai Chan - Fotolia
Flash startup Vast Data strives to make disk storage extinct
VAST Data CEO has immense plans for storing big data on all-flash systems and predicts his startup's combination of storage class memory and consumer-grade QLC drives will make disk go the way of the dodo.
Vast Data came out of stealth in 2019 with all-flash storage designed to serve all applications, including those with data traditionally stored on spinning disk because of cost. The startup's Universal Storage hardware combines Intel Optane persistent memory chips and consumer-grade QLC NAND to provide fast access without tiered storage. Intel Optane drives are made from 3D XPoint memory, which is said to be a cross between NAND flash and dynamic RAM.
"We see our system as an extinction-level event for the hard drive, and we see that [happening] today, not in a few years from now," Vast Data CEO Renen Hallak said.
Vast Data's Universal Storage hardware is just a bunch of disks in a 1U chassis, with a top-of-rack Ethernet switch that allows servers to access all storage media. The Universal Storage name doesn't mean it appeals to all enterprises: The minimum Vast Data configuration starts at 1 petabyte (PB). Vast Data storage is for organizations that experience rapid growth in newly created data.
"There's no technological limit that forces us to sell large systems, although our systems performs better the bigger it gets," said Hallak, who formerly helped to engineer all-flash startup XtremIO, which EMC acquired in 2012. Hallak served as an EMC vice president until 2015, the same year Dell Technologies acquired EMC.
Vast Data was launched in 2016 and came out of stealth in February 2019. We spoke with Hallak about the startup's vision to make flash affordable for all storage, including Vast's roadmap for 2020.
What is the biggest change in flash since your time at XremIO?
Renen Hallak: The first generation of all-flash systems took over the tier zero space, with companies like Pure Storage and Violin Memory. We see Vast Data as part of the next generation of flash. We are collapsing the pyramid. We aren't doing tier 2 flash. From a cost perspective, our goal is to take flash down to the lower levels of the storage hierarchy, while maintaining the performance, density and resilience people expect at the high end.
We're saying, 'You don't need to have multiple tiers of storage.' There is no tradeoff in flash anymore between price and performance. There is an endurance tradeoff, but we leverage the fact there is no price-to-performance tradeoff to build this new type of system.
It's true that flash prices are lower than a few years ago, but flash still costs more on average than disk. What signs are there to indicate that end users want flash for all data?
Hallak: I completely agree with your assessment. When I left XremIO to start Vast Data, I talked to a lot of customers. I expected them to all say 'We need more performance.' And yet none of them did. They love the characteristics of all-flash systems, but they don't want higher performance. What they want is for flash to be a lot more cost-effective, such that they can use it for the entirety of their applications. The forward-looking ones are [focused on] analytics applications: AI, machine learning, deep learning. The older flash infrastructure only allows fast access to a tiny piece of their capacity.
Renen HallakCEO, Vast Data
The way we built our system, and the algorithms we have in place, makes Vast Data cost-competitive with hard drive-based systems. We give users the benefits of a flash and 3D XPoint-based system with much better scale and resilience than disk. The reason we start at 1 PB is that we are focused on solving the challenges with big data.
How does Vast Data compare to software-defined storage or hyper-converged infrastructure?
Hallak: Hyper-converged infrastructure is based on the idea that you don't need separate networks for your LAN, your SAN and your internal storage networks. Ethernet is now fast enough to handle all those workloads as a common fabric. We agree with that. We don't agree that your compute, your networking and your storage device should all be in a single box. The ratio between storage capacity, storage performance, compute, networking and memory will change over time as the application changes, and you can't predict when.
We believe in the disaggregated model. The data center is the new computer, and the compute processes are in containers running across servers. The SSDs are accessible over a network, and the network is becoming the bus. CPUs are on one end of the network, the storage is at the other end, and all the CPUs can see all the storage.
What lessons from XtremIO did you bring to bear when building the Vast Universal Storage array?
Hallak: The biggest lesson was around developing path control. At XtremIO, it took us about six months to develop the good path, without doing any failure handling. Then it took us about 3 1/2 years to develop failure handling, and it never actually was perfect. And the reason for that was that we needed to put data in volatile memory for reasons of speed, and then we needed to dump it into persistent media when bad things happen. We just had a lot of processes to run upon a failure event.
At Vast, we leverage the 3D XPoint as the new persistent memory, which is much larger, almost as fast as other memory storage, and always consistent. It's made available through NVMe over Fabrics over a network so that every user can see it. You don't need to have the nodes talk to each other anymore. This new memory, when used with our software, can enable new architectures that solve old problems.
How do your customers typically buy and scale the Universal Storage system?
Hallak: They buy it as a very large cluster and can keep adding to it. They no longer need a refresh cycle. You can grow capacity independently of performance, because performance bottlenecks have shifted from the media itself up to the CPU. You just need to deploy more software containers to get more performance.
What data features are you planning to add, and what's the timeline?
Hallak: We are expanding into new use cases that require additional functionality. Our last software release in mid-2019 added snapshots. We're adding replication and will be adding Windows support for SMB and encryption.
We are also starting work on the next wave of innovation, which starts with insight generation. We'll start with metadata, and then we'll [expand] to inside generation of the data itself. Storage systems need to go beyond commands of 'read this' or 'write that' to point up similarities to file across the namespace. As an example, the way we use similarity-based data reduction enables us to answer those types of questions.