Getty Images/iStockphoto

Why the AI era requires an intelligent data infrastructure

NetApp aims to build infrastructure to ‘bridge the chasm’ that exists between cloud-based AI models and an organization’s on-premises data management environment.

Many organizations looking to tap into the power of emerging artificial intelligence capabilities such as GenAI find themselves in a quandary.

Most cutting-edge AI capabilities reside chiefly in hyperscale clouds, yet, most organizations’ key data still resides on-premises. As organizations move beyond the experimental stage of AI and begin to think about scaling their AI initiatives, the challenge becomes how to bridge this data-AI divide.

On the surface, the solution seems obvious; organizations can integrate their own data with cloud-based AI services and begin to enrich existing LLMs and other AI apps with their own differentiated data, using their industry and other domain-specific expertise to create unique insights to drive value. Indeed, in a recent Enterprise Strategy Group research study, 84% of respondents said it was important to incorporate their own enterprise data to support their organization’s GenAI efforts.

Yet, achieving this at operational scale is much easier said than done. For a start, many organizations are nervous about feeding their own valuable data into public clouds. For many, the risks are substantial, from exposing sensitive data from highly confidential proprietary IP to personally identifiable information. Then there’s the cost and complexity around managing the process; shuttling large data sets back and forth, creating copies of datasets, keeping models updated with the latest data, and keeping track of everything over time.

The broader challenge is that many organizations simply don’t have a good enough understanding of their data overall. Compounded by issues such as rapid data growth and data fragmentation, business users and data scientists spend far too much time ‘data wrangling’ -- identifying, gathering and preparing the required data to feed into aa model -- rendering the overall process unwieldy.

More time wrangling means it takes longer to train and deploy models, and even longer to get to the inference stage, which is the entire point. Additional latency also makes it more likely that inferencing data is out of date, so the whole process must be repeated.

Such issues help explain why data management is increasingly recognized as the Achilles heel in delivering an enterprise AI strategy.  Enterprise Strategy Group research found a lack of quality data as the number one challenge organizations are encountering while implementing AI. Data -- or the lack thereof -- is rapidly becoming a major roadblock that is hampering AI initiatives at organizations across the board.

For this reason, NetApp’s recent focus on building an ‘intelligent data infrastructure’ for the AI era is noteworthy. The company used its recent Insight 2024 conference to detail a vision to support AI success for its customers. At the center of this multi-dimensional vision is a pledge to ‘bridge the chasm’ that exists between largely cloud-based AI models and an organization’s on-premises data management environment.

NetApp’s vision will combine multiple aspects spanning both on-premises and cloud-based environments, designed to help simplify, automate and de-risk the data management workflow supporting enterprise AI at scale.

For example, at Insight, NetApp announced the creation of a global metadata namespace, along with innovations to its core OnTap software, will enable it to explore, classify and manage data across customer’s NetApp estate and directly integrate into AI data pipelines to more effectively enable scalable searches and RAG inferencing. Meanwhile, a pending ‘disaggregated’ storage architecture -- also part of OnTap -- will enable more cost-effective scaling for compute intensive AI workloads such as AI training.

Crucially, NetApp’s AI-enabling vision also extends to the public cloud. The company is already well-placed here, with all three hyperscalers offerings NetApp capabilities in their clouds as first-party services.  NetApp is further building on this by developing additional cloud native capabilities that will offer extended integration with AI-related services; for example, Azure NetApp Files with a range of Microsoft Azure AI services, FSxN for NetApp ONTAP with Amazon Bedrock, and Google Cloud NetApp Volumes with Google Vertex AI and BigQuery.

With these integrations in place, NetApp believes it is uniquely placed to enable customers to enrich public cloud-based LLMs with their own, on-premises data in a secure, scalable and well-defined manner.

For the time being, NetApp’s vision remains just that; the on-premises aspects will be delivered over the coming year, though many of the cloud integrations are expected to be available in the coming months. Additionally, NetApp recognizes it can’t deliver everything itself; hence there’s also a strong emphasis on building an ecosystem of partners, spanning hardware (e.g. Lenovo, NVIDIA), a broad set of software ISVs and service providers such as Domino Data Labs, in addition to the cloud partners.

With its intelligent data infrastructure and AI vision, NetApp is making a strong statement of intent. It will be fascinating to see how the strategy unfolds over the coming months.

Simon Robinson is principal analyst covering infrastructure at TechTarget's Enterprise Strategy Group.

Enterprise Strategy Group have business relationships with technology providers.

Dig Deeper on Cloud storage

Disaster Recovery
Data Backup
Data Center
Sustainability
and ESG
Close