putilov_denis - stock.adobe.com

Tip

Compare enterprise generative AI deployment options

To pick the best generative AI deployment model for your organization, examine how cloud and on-premises approaches fit into your security, cost, infrastructure and network needs.

With multiple generative AI deployment models to choose from, enterprises must properly evaluate their options or risk problems such as security breaches, excessive costs and integration difficulties.

While most generative AI deployments today are based on large language models, their scope can vary significantly. A nuanced approach to deploying generative AI should consider the size of the model's compute and data requirements, as well as the scale of its intended use case -- ranging from general-purpose consumer models such as ChatGPT to internal tools with limited goals and resource needs.

For any enterprise generative AI initiative, there are four main deployment options to consider: on-premises, private cloud, hybrid cloud and public cloud. Which option to choose depends on factors such as data sensitivity, scalability requirements and capacity for managing infrastructure. Comprehensively evaluating these capabilities and needs is crucial to achieving a smooth and secure generative AI deployment.

Generative AI deployment considerations

When deciding among generative AI deployment approaches, consider how each option stacks up in the critical areas of security, mission requirements, costs, infrastructure, network connectivity and integration.

Data security

While narrow-purpose, enterprise-specific generative AI doesn't get as much attention as its bigger consumer counterparts, it's the strategy that interests enterprises most for the data security that comes with self-hosted generative AI deployments.

The strongest business cases for AI involve the information enterprises guard most carefully. As a rule, to lessen the risk of exposing critical business data, never host generative AI in a place where you wouldn't be willing to store the data it uses. An enterprise might not want to risk storing its critical financial data and software source code in the cloud, for example.

Stringent data security requirements can dictate on-premises hosting of generative AI. However, for some applications, it might be possible to host a private instance of generative AI in the public cloud as a transitional strategy or if an on-premises component of the application can properly anonymize queries to reduce security concerns.

Mission requirements and costs

As data security requirements lessen, hybrid and public cloud options become more feasible, making mission requirements and costs a more important factor. If generative AI use is highly variable in terms of application, data and usage rates, a public cloud strategy might be best; otherwise, the cost of data center hosting to fit peak volumes would likely be too high to justify.

However, on-premises hosting is almost certainly less expensive and more responsive if the mission calls for generative AI use by a small number of users at a relatively consistent pace, or if a significant amount of data is regularly sent to or received from the AI model. As users continue to build business use cases for generative AI, on-premises hosting will become more common.

The hybrid cloud option is a valuable transitional strategy. Some generative AI applications combine the requirement to protect confidential data with the need to incorporate broader industry or public data. All the major public cloud providers' cloud-hosted foundation models, as well as the public models from OpenAI, Microsoft and Google, also offer APIs to integrate their capabilities with on-premises applications.

Enterprises might want to use this same hybrid model to transition from public-model generative AI validation testing to self-hosting. In either case, whatever company data requires absolute protection can stay on premises.

Infrastructure

After selecting either on-premises or hybrid hosting of generative AI based on security, mission requirements and costs, the next area to address is infrastructure. Generative AI services today rely on massive data centers made up of racks of GPUs. This massive scale is necessitated by the broad mission of LLMs, which are trained on vast amounts of internet data and have highly complex architectures.

When an enterprise needs that breadth of generative AI, public models are the only practical option; no company could justify operations at that scale on its own. But hosting your own generative AI in house will mean adopting a tool that you can scale down to train and operate on company data.

Enterprises might find this requires little more than a rack of GPUs, particularly if the mission doesn't require results in real time. Most use cases for in-house generative AI operate as extensions of business intelligence or analytics applications, which rarely require instantaneous responses. Designing in-house hosting should therefore start with a review of the requirements of the specific generative AI tools available.

Network connectivity

Network connectivity is especially important in on-premises generative AI hosting. To meet the extensive data processing demands of advanced AI computations, large-scale AI providers often rely on GPU racks connected with InfiniBand. This setup, while complex and costly, offers high bandwidth and low latency -- but most organizations are unlikely to require such extensive infrastructure for their AI applications.

However, an organization deploying generative AI will need at least several racks of GPUs, and how to connect them is a matter of debate. In enterprise data centers, Ethernet is the dominant networking technology, with major networking companies such as Cisco and Juniper pushing back against InfiniBand in favor of Ethernet. Enterprises should strongly consider connecting their AI data centers through Ethernet to reduce costs and avoid introducing a new technology, since their scale of operations won't justify InfiniBand.

Integration

Integrating AI with other applications, managing the components of a hybrid generative AI deployment and connecting generative AI with its users are other important issues. These points are crucial for network capacity planning -- where traffic volumes are the critical factor -- and for development and security.

Integration starts with a basic but often overlooked question: What is generative AI generating? Free-form queries and responses might be helpful to users, but they're difficult to integrate with other applications. For this reason, public-model generative AI is typically the most difficult to integrate.

Because public cloud, hybrid and on-premises generative AI hosting all tend to rely on the same open source software, and because integration features are included in these packages, integration issues rarely affect the choice of deployment options. Enterprises can lessen the impact by generating tabular data if their goal is to address business analytics or even network and IT operations with AI.

Choosing the right deployment option

Public-model generative AI is the best strategy for casual applications such as document development. Public-cloud hosting offers the easiest and least expensive way to trial AI in applications that demand more data interchange.

The best business use cases for AI will both consume and generate vast amounts of data when fully utilized. As applications of AI mature, they can also expand from analyzing less-sensitive data to handling information with much higher security requirements.

The best AI strategy is one that supports both public cloud and data center hosting, just as that's the best strategy for application hosting overall. Consequently, enterprises should approach generative AI with the goal of facilitating a migration of AI from the cloud into the data center. Assume that self-hosting will be the eventual strategy, but that starting in the cloud will work for applications that don't require extreme data security measures.

Tom Nolle is founder and principal analyst at Andover Intel, a consulting and analysis firm. By background, Nolle is a programmer, software architect, and manager of software and network products, and he has provided consulting services and technology analysis for decades.

Dig Deeper on AI infrastructure