Why the AI era requires an intelligent data infrastructure

NetApp aims to build infrastructure to 'bridge the chasm' that exists between cloud-based AI models and an organization's on-premises data management environment.

Many organizations looking to tap into the power of emerging artificial intelligence capabilities such as generative AI find themselves in a quandary.

Most cutting-edge AI capabilities reside chiefly in hyperscale clouds, yet most organizations' key data still resides on premises. As organizations move beyond the experimental stage of AI and begin to think about scaling their AI initiatives, the challenge becomes how to bridge this data-AI divide.

On the surface, the answer seems obvious: Organizations can integrate their own data with cloud-based AI services and begin to enrich existing large language models (LLMs) and other AI apps with their own differentiated data, using their industry and other domain-specific expertise to create unique insights to drive value. Indeed, in a recent research study from TechTarget's Enterprise Strategy Group, 84% of respondents said it was important to incorporate their own enterprise data to support their organization's generative AI efforts.

Yet, achieving this at operational scale is much easier said than done. To start, many organizations are nervous about feeding their own valuable data into public clouds. For many, the risks are substantial, from exposing sensitive data from highly confidential proprietary intellectual property to personally identifiable information. Then there's the cost and complexity around managing the process; shuttling large data sets back and forth, creating copies of data sets, keeping models updated with the latest data and keeping track of everything over time.

The broader challenge is that many organizations simply don't have a good enough understanding of their data overall. Compounded by issues such as rapid data growth and data fragmentation, business users and data scientists spend far too much time "data wrangling" -- identifying, gathering and preparing the required data to feed into a model -- rendering the overall process unwieldy.

More time wrangling means it takes longer to train and deploy models, and even longer to get to the inference stage, which is the entire point. Additional latency also makes it more likely that inferencing data is out of date, so the whole process must be repeated.

Such issues help explain why data management is increasingly recognized as the Achilles heel in delivering an enterprise AI strategy.  Enterprise Strategy Group research found a lack of quality data as the number one challenge organizations are encountering while implementing AI. Data, or the lack thereof, is rapidly becoming a major roadblock that is hampering AI initiatives at organizations across the board.

For this reason, NetApp's recent focus on building an "intelligent data infrastructure" for the AI era is noteworthy. The company used its recent Insight 2024 conference to detail a vision to support AI success for its customers. At the center of this multidimensional vision is a pledge to "bridge the chasm" that exists between largely cloud-based AI models and an organization's on-premises data management environment.

NetApp's vision will combine multiple aspects spanning both on-premises and cloud-based environments, designed to help simplify, automate and de-risk the data management workflow supporting enterprise AI at scale.

For example, at Insight, NetApp announced the creation of a global metadata namespace, along with innovations to its core OnTap software, that will enable it to explore, classify and manage data across a customer's NetApp estate and directly integrate into AI data pipelines to more effectively enable scalable searches and retrieval-augmented generation inferencing. Meanwhile, a pending "disaggregated" storage architecture, also part of OnTap, will enable more cost-effective scaling for compute-intensive AI workloads such as AI training.

Crucially, NetApp's AI-enabling vision also extends to the public cloud. The company is already well-placed here, with all three hyperscalers offering NetApp capabilities in their clouds as first-party services. NetApp is further building on this by developing additional cloud-native capabilities that will offer extended integration with AI-related services; for example, Azure NetApp Files with a range of Microsoft Azure AI services, FSx for NetApp OnTap with Amazon Bedrock, and Google Cloud NetApp Volumes with Google Vertex AI and BigQuery.

With these integrations in place, NetApp believes it is uniquely placed to enable customers to enrich public cloud-based LLMs with their own, on-premises data in a secure, scalable and well-defined manner.

For the time being, NetApp's vision remains just that; the on-premises aspects will be delivered over the coming year, though many of the cloud integrations are expected to be available in the coming months. Additionally, NetApp recognizes it can't deliver everything itself; hence there's also a strong emphasis on building an ecosystem of partners, spanning hardware -- e.g. Lenovo, Nvidia -- a broad set of software ISVs and service providers such as Domino Data Labs, in addition to the cloud partners.

With its intelligent data infrastructure and AI vision, NetApp is making a strong statement of intent. It will be fascinating to see how the strategy unfolds over the coming months.

Simon Robinson is principal analyst covering infrastructure at TechTarget's Enterprise Strategy Group.

Enterprise Strategy Group is a division of TechTarget. Its analysts have business relationships with technology vendors.

Next Steps

Modernizing storage with purpose in the AI era

Dig Deeper on Cloud storage