CW Asia-Pacific

CW APAC: Buyer’s guide to NVMe storage

NVMe for AI: A powerful pairing

NVMe storage capabilities provide the bandwidth and low latency that demanding AI and machine learning applications need to access and manage the massive amounts of data they use.

AI and machine learning systems have long relied on traditional compute architectures and storage technologies to meet their performance needs. But that won't be the case for much longer. Today's AI and machine learning systems -- using GPUs, field-programmable gate arrays and application-specific integrated circuits -- process data much faster than their predecessors.

Meanwhile, the data sets used to train those smart systems have grown progressively larger. To meet these growing demands, adopters are turning to NVMe for AI functionality.

NVMe provides greater bandwidth and lower latency than SAS and SATA, enabling maximum performance for demanding workloads. Machine learning training, for instance, uses millions of data examples to train algorithms so they can make decisions about new data.

"NVMe has moved from the bleeding edge when launched early this decade to the mainstream storage option for AI in 2019," said Jason Echols, senior technical marketing manager at Micron Technology, which offers NVMe SSDs.

Traditional spinning storage has an access time that's three orders of magnitude slower than current NVMe technology, said Scott Schweitzer, director and technology evangelist at Solarflare Communications, which offers technologies designed to accelerate cloud data center applications and electronic trading platforms.

Traditional storage, designed with disk heads reading off a spinning disk, is serial in nature, he said. "The controllers only provide a handful of queues that often map back to the number of heads on the disk," he said. NVMe devices, by contrast, have 64,000 queues, enabling them to serve as many as 64,000 parallel requests for data.

Faster is better

NVMe has moved from the bleeding edge when launched early this decade to the mainstream storage option for AI in 2019.
Jason EcholsSenior technical marketing manager, Micron Technology

Flash is already a key component in AI platforms that pair high-performance, scale-out storage with GPU-accelerated compute to eliminate I/O bottlenecks and fuel AI insights at scale, said Matthew Hausmann, AI and analytics product marketing manager at Dell EMC. "Faster is always better, so NVMe is a natural progression of these solutions, driving additional performance and moving them closer to real time."

Schweitzer expects NVMe will replace traditional storage in AI environments. AI applications often require enormous data sets, and as applications become more performance-oriented, waiting for traditional disk subsystems quickly becomes the long pole in the computational tent.

"It wasn't but a few years ago that networking was the performance curve on the far right that limited overall system performance," he observed. "As we moved to 10 [Gigabit Ethernet], then 25 GbE and soon 100 GbE and later 400 GbE, networking is rapidly approaching local memory access speeds."

AI applications running on GPU-based systems can use NVMe storage to feed virtually any size GPU farm with far greater performance than traditional storage technologies, said Kirill Shoikhet, chief architect at Excelero, a distributed block storage supplier. "Modern GPUs used in AI and [machine learning] applications have an amazing appetite for data, up to 16 GBps per GPU," he noted. "Starving that appetite with slow storage or wasting time copying data back and forth wastes the most expensive resource you've purchased."

NVMe for AI use case

NVMe works well for specific AI use cases, such as training a machine learning model and checkpoints, where backup snapshots are taken of the training in progress. Machine learning includes two phases: training a model based on what's learned from the data set and then actually running the model. "Training a model is the most resource-hungry stage," Shoikhet explained. "Hardware used for this phase -- usually, high-end GPUs or specialized SoCs [systems on a chip] -- is expensive to buy and run, so it should be always busy."

The machine learning process
How data sets are used to train machine learning applications

Modern data sets used for model training can be huge. MRI scans, for example, can reach multiple terabytes apiece and, when used for machine learning, may require tens or even hundreds of thousands of images.

"Even if the training itself runs from RAM, the memory should be fed from non-volatile storage," Shoikhet said. Paging out old training data and bringing in the new data should be done as fast as possible to keep the GPUs running. That means latency should be low as well, he said, and for this type of application, NVMe is the only protocol that supports both high bandwidth and low latency.

Checkpoint setting also benefits from NVMe technology. "If a training process is long, the system can choose to save a snapshot of the memory into non-volatile storage to allow a restart from that snapshot in case of a crash," Shoikhet explained. "NVMe storage is very suitable for this kind of usage as well."

Potential pitfalls

It's important to fully understand the storage I/O profile of an AI application in order to match the right NVMe SSD to specific needs. "Some AI environments, especially training, are very read-centric, meaning you can realize cost and performance benefits without breaking the bank," Echols said.

For all use cases involving NVMe for AI, Hausmann advised steering clear of proprietary NVMe storage technologies and, instead, looking for NVMe that's built into flagship enterprise products. "You might lose a few nanoseconds on paper, but you'll be light years ahead when your system stays up and running and is still supported six months down the road."

Article 3 of 3

Dig Deeper on Storage system and application software