Storage metadata services for AI apps trending at Nvidia GTC
Nvidia's GTC show once again places AI applications at center stage, with storage vendors seeing their platforms' metadata capabilities as a key development component.
Storage vendors are jockeying once more to position their software and hardware as premier offerings for AI workload infrastructure at Nvidia's GPU Technology Conference in San Jose, Calif., this week.
The latest trend among storage vendors is integrating metadata tools and services into their platforms to create unstructured data lakes, which form the core information repositories of AI applications.
Some vendors are looking for unique hardware offerings while others are relying almost exclusively on software, but strategies for performant AI storage are still under development, said Simon Robinson, an analyst at Enterprise Strategy Group, now part of Omdia.
"As ever with AI, it's in a constant state of evolution," Robinson said.
Metadata meta
Metadata is emerging as a key component for many storage offerings focused on AI, with vendors attempting to differentiate capabilities in automation, comprehension and access speeds.
Performant AI applications need higher access speeds for storage memory and software services, but metadata provides vital intelligence to data, said Brent Ellis, an analyst at Forrester Research.
Metadata brings immediate context to data such as the date of creation, whether sensitive information is in the files and which user created the file, he said.
More complicated or specialized tags can form important connections for vector databases, the primary databases used for generative AI applications. Detailed metadata can return information that's buried in hybrid-cloud environments.
"The focus on metadata is really important for AI development, applications and models," Ellis said. "When you talk about adding intelligence to the application layer, context matters a lot."
[Metadata is] so useful and is so commoditized that every system needs it.
Brent EllisAnalyst, Forrester Research
Bringing metadata management into the storage environment could eliminate the need for a separate data management layer, he said. Most metadata markings and tags are universal, so data lakes can be created from a variety of cloud and on-premises sources, regardless of specific hardware or services.
"What's interesting is the particular convergence of this into storage," Ellis said. "It's like adding a feature to the storage layer that wasn't there before. [Metadata is] so useful and is so commoditized that every system needs it."
Software solutions
NetApp and DataDirect Networks (DDN) are two vendors prioritizing metadata for their AI software offerings at and ahead of the GPU Technology Conference (GTC) this year.
The NetApp OnTap hybrid-cloud storage operating system now supports Nvidia AI Data Platform reference design and is available within several Nvidia reference architectures for systems like the Nvidia DGX SuperPod or HGX Systems.
Last September, NetApp announced its NetApp Metadata Engine, a new capability within OnTap that will bring disaggregated storage management to NetApp hardware and public clouds for AI workloads. The company has not set a release date for the product.
The importance of metadata discovery and automation will increase in the years to come for AI applications beyond Metadata Engine, said Jonsi Stefansson, CTO and senior vice president at NetApp. Effective metadata cataloging is a way for enterprises to discover useful data for AI regardless of location or department.
"Anybody can reap the benefits of the metadata catalog, no matter what you're trying to do," he said. "The [data storage] silos are killing the advancement of AI."
DDN similarly aims to eliminate silos and orphaned data through DDN Infinia, a software-defined object storage platform with data management and multi-tenancy capabilities.
The Infinia software also forms the basis of DDN's numerous new offerings announced at GTC.
DDN previewed EXA Fusion Tuesday, which combines the vendor's ExaScaler parallel file system with the Infinia software. DDN also debuted Infinia Ocean, a data automation platform for hybrid cloud environments at GTC.
DDN did not provide specific launch dates for Fusion and Infinia Ocean.
DDN still sells its storage software with hardware appliances, but those appliances use commodity hardware, said Alex Bouzari, co-founder and CEO of DDN.
Bespoke hardware configurations are anathema to AI adoption, as best practices and new technologies have customers wanting to avoid an architecture lock-in, he said.
"It's just not realistic in the world of AI, where things are moving very fast," Bouzari said. "You have to be doing [AI workloads] on top of existing infrastructure. The hardware layer has been commoditized; it's the software layer that is valuable."
DDN is taking its software ambitions further with IndustrySync, a new service available Tuesday.
IndustrySync offers AI stacks sold through the cloud for a variety of industries, including financial services, life systems and autonomous driving. These stacks include DDN's Infinia software operating on Nvidia DGX systems through a variety of cloud providers.
Hard-wired for AI
Vendors such as HPE and Pure Storage expect that customers will need specialized hardware to optimize AI services, albeit with some flexibility concessions, according to company spokespeople.
HPE is looking to leverage its wide array of interconnected hybrid-cloud services in GreenLake alongside its storage hardware for an AI private cloud and capitalize on its Nvidia partnership.
The vendor's Alletra Storage MP X10000, HPE's object storage offering, now supports the HPE Data Fabric software that's used as the hybrid-cloud data layer within HPE's Private Cloud AI Software. The MP X10000 also supports automated metadata tagging capabilities for data stored in the platform, according to HPE.
HPE Data Fabric was previously known as HPE Ezmeral Data Fabric.
HPE's block storage offering, Alletra Storage MP B10000, now supports file data on a single array with disaggregated architecture. Other new enhancements for the B10000 line include a software-defined B10000 cloud storage service in Microsoft Azure and built-in ransomware detection services.
HPE Data Fabric's support for the Alletra Storage MP X10000 will be available by the third quarter of 2025. New features for the MP X10000 and MP B10000 will be available in May.
Pure Storage's new FlashBlade EXA hardware and architecture disaggregates metadata information from the stored data itself. The capability was previewed March 11 and is expected to be available later this year, according to the company.
Metadata is stored in specific metadata node hardware made by Pure Storage, while the text, images or other data is stored within commodity NVMe hardware connected by a network card.
This structure should enable scalability and consistent performance without bottlenecks around metadata compared to competitor architectures, said Chadd Kenney, vice president of technology at Pure Storage.
This architecture is the first time Pure has made its software operate with third-party hardware, although Pure plans to sell its own capacity storage nodes for Exa hardware in the future, he said.
The FlashBlade Exa platform is specifically designed for what Kenney calls AI factory customers, companies or data centers looking to create and resell AI applications, but not to the scale of a hyperscaler cloud.
"A lot of them want to use off-the-shelf hardware," he said. "I think every enterprise is going to edge closer to what these large GPU clouds look like."
He expects a similar storage architecture to become more common among enterprise data centers using AI as well.
Pure's new architecture is similar to one sold by Hammerspace, which offers a software-defined disaggregated storage platform without the need for specific hardware, said Mitch Lewis, research analyst at Futurum Group.
The so-called AI factory customers Pure is chasing are pushing the limits of many vendors' storage platform scalability, resulting in the rise of alternative infrastructures, Lewis said.
"The scale these large, AI-driven companies are going for is pushing the boundaries of their systems," he said.
Tim McCarthy is a news writer for Informa TechTarget, covering cloud and data storage.