Storage vendors adding data management tools to meet AI needs

Storage vendors are now developing their own tools and capabilities related to data management tasks within their platforms as interest in enterprise AI development increases.

Storage vendors are now building and bundling data management capabilities to woo AI infrastructure shoppers.

These specific data management offerings within storage can range from new capabilities within the storage platform software to adding features to foundational storage.

AI applications and model development are forcing enterprise customers to reconsider their data storage strategy, said Charles Giancarlo, CEO of Pure Storage, during an earnings call with investors on Wednesday.

The next major opportunity for storage vendors include dealing with the challenges of AI training and inferencing and processes in AI development that refine an AI's output for specific tasks using enterprise data, Giancarlo said.

"We see AI as a major prod toward customers rethinking the architecture of their data storage into something much more friendly to AI getting access to data for real time analysis," he said.

Storage buyers may find use in bundling new capabilities with their hardware, especially organizations with fewer data specialists, according to industry analysts. Still, the challenge of vendor lock-in looms large.

"If you look at unstructured data as gold, these companies want to be the miners," said Mary Jander, an analyst at research firm Futuriom. "Storage has become much more important in the AI equation."

Meshing together

What defines data management capabilities differs from vendor to vendor.

Software-defined storage platforms such as Pure Storage's Fusion and DataDirect Networks (DDN) Infinia support high-performance computing and data availability through automation, metadata tagging and API connectivity.

However, other vendors are looking to bring the storage infrastructure closure to duties and tools related to data lakehouses, which can add value to their platforms.

Storage hardware and resources have essentially become commodities, said Marc Staimer, president of Dragon Slayer Consulting. Data management tools like the offerings of Vast and AWS can lead customers to choose one storage platform over another.

"Storage is part of your infrastructure and databases are software infrastructure," Staimer said. "Storage is a commodity in most people's minds, a database is not."

Vast Data, another software-defined storage company, will soon offer an Apache Kafka API-compatible event streaming engine called Vast Event Broker. The capability enables automatic synching active and archived data for data warehouse ingestion, but this is limited to Vast customers.

Cloud hyperscaler AWS introduced S3 Tables last fall, a new managed service that enables object data stored in AWS S3 buckets to become tabular data in the Apache Iceberg format for data lakes.

AWS also launched S3 Metadata this January, another new feature for S3 to generate object metadata automatically.

Both services aim to improve interoperability between data storage in the storage platform itself with data lakes and data warehouses, said William McKnight, president of the McKnight Consulting Group.

As enterprises begin inferencing and training AI for specific uses, many will implement data meshes, McKnight said.

A data mesh provides decentralized data architecture for these AI applications to draw from a wide array of enterprise sources, but would require storage platforms to seamlessly operate across departments and teams, he said. 

"This is all part of the emerging standard of having a data mesh architecture," McKnight said. "If a customer doesn't have to think too much about storage, there's something to that."

Connecting storage and these services together shouldn't significantly affect performance or operations for most data management implementations, said Donald Farmer, principal at TreeHive Strategy. The concern comes from architecting around a given vendor's offering and agreeing to a certain level of lock-in.

"The tighter integration, in theory, should enable optimizations that aren't possible with separate components," Farmer said. "[But] a lot of people are concerned about vendor lock-in."

The trade-off of using a vendor implementation for data management may be worth it if an organization lacks specialized data analysis staff or IT headcount, he said.

Strategy management

However, complications can still arise even as the technology stack shrinks.

As more storage vendors offer data management tools, buyers should ensure they've developed storage and data management strategies, said Simon Robinson, an analyst Enterprise Strategy Group, now part of Omdia.

Storage itself needs strict rules and guides to maintain performance and availability to ensure that only needed or useful data seeps into the data lakes.

"It's becoming increasingly evident it's not just about storing ones and zeros. You have to have an understanding of the data you're managing," Robinson said.

Enterprise implementation of AI is still in its infancy, Jander said, so best practices and specific tooling are still being determined. Legal quandaries and concerns about data providence are still looming large over the legal teams so many projects have yet to reach full production.

"Enterprises are still kicking the tires of AI," she said. "They're not clear on what data to use or what data is legal to use."

Tim McCarthy is a news writer for Informa TechTarget covering cloud and data storage.

Dig Deeper on Storage architecture and strategy