KOHb - Getty Images

Google Gemini 2.5 Pro extends on-prem GenAI support

Google Gemini is the first proprietary frontier model that can be run on-premises via Google Distributed Cloud for privacy- and cost-conscious enterprises.

Google Gemini 2.5 Pro will become the first proprietary frontier large language model available for on-premises deployment later this year, including in air-gapped environments.

Other major frontier large language model (LLM) providers, such as Anthropic and OpenAI, do not support on-premises deployments of their latest models. Microsoft Azure OpenAI Service supports on-premises deployments of cloud APIs to move them closer to user data but does not have an on-premises version.

Until now, enterprises with security, privacy and cost concerns that avoided cloud-based model access using an API were limited to open source models such as Meta's Llama and DeepSeek, said Chirag Dekate, an analyst at Gartner.

Chirag Dekate, analyst, GartnerChirag Dekate

Google will partner with Nvidia to make Blackwell GPU-based Google Distributed Cloud (GDC) appliances available for on-premises private deployments in the third quarter. A Google press release did not specifically mention its latest model, Google Gemini Pro 2.5, released last month, as part of the package. But it does specify that it will support its "most capable models," and mentions a 1 million-token context window, which matches Gemini Pro 2.5.

"Many of our enterprise clients are actively using, evaluating and building around Llama 3 and evaluating Llama 4 … and DeepSeek as well. Nothing's wrong with that," Dekate said. "But when you need enterprise-grade safety, security guardrails, and, more importantly, liability protection and so on, if you want to tap into frontier model innovation, and you're building things on-prem, you are kind of out of luck."

The push for privacy in GenAI

Another industry analyst sees this move by Google as an attempt to counter competition from VMware by Broadcom, which has been emphasizing private cloud as a more cost-effective alternative to public clouds, including for AI workloads.

"Private cloud and on-premises adoption across the cloud-native ecosystem is top of mind for both vendors and enterprises right now, arising in part around the Broadcom acquisition of VMware, which has impacted many platform provider go-to-market strategies in my orbit," wrote Devin Dickerson, an analyst at Forrester Research, in an email. "These technologies will [get] broad adoption in public cloud, but the reality is that on-premises and private cloud environments remain highly relevant as deployment targets, even for modern applications." 

Docker Inc. is another vendor pushing into on-premises LLM deployment, adding support for Google's free and open source Gemma model and Llama to Docker Desktop 4.40 last week. Docker Desktop Model Runner brings with it support for these LLMs as Open Container Initiative artifacts that can be stored in containers on developer machines. It will also partner with Google, Continue, Dagger, Qualcomm, Hugging Face, Spring AI and VMware Tanzu AI Solutions to extend local integrations with more AI models and frameworks.

The cost and context problem

In addition to security and privacy concerns, the costs of relying on cloud APIs and local model performance are mounting concerns for enterprise developers as they experiment with LLMs, said Nikhil Kaul, vice president of product marketing at Docker, and previously head of marketing for Google's cloud native app development team.

"There's no delay in data transmission to and from the cloud server when you're trying to develop locally, on your own existing hardware," he said. "Typically, if you end up using cloud services, you end up paying for those cloud services."

The upshot for developers is in the ability to connect [AI] to existing enterprise data and systems -- solving this context problem is far more important for enterprise results with AI than which models they choose.
Devin DickersonAnalyst, Forrester Research

Dekate said he doesn't expect most enterprise GenAI deployments to run on-premises long-term, but some data and workloads will never be migrated to cloud.

"Most enterprises are using GenAI to accelerate migration for data that can be migrated to the cloud [and] trying to spend less on legacy data center infrastructures," he said. "But having done that, what many enterprises are realizing is some of the data cannot be moved to the cloud, even if they want to … [But they] need to be able to tap into a common set of innovative models."

Extending beyond public cloud will also help Google Gemini users tap into a more holistic set of data, advancing enterprise GenAI development, Dickerson said.

"The upshot for developers is in the ability to connect [AI] to existing enterprise data and systems -- solving this context problem is far more important for enterprise results with AI than which models they choose," he said. "There's a lot you can do with general-purpose tooling, but the real value for enterprise customers comes when the tools become more context-aware within the software development lifecycle."

Agent2Agent protocol syncs AI agents

While Google Gemini support on GDC extends LLMs beyond cloud, Google will also extend its AI agents beyond its own product portfolio with Model Context Protocol in its Agentspace enterprise search product -- also newly available on-premises. With partners, it also kicked off the Agent2Agent protocol project as a proposed standard for agent-to-agent communication.

Model Context Protocol, developed by Anthropic and its partners, is primarily designed to connect AI agents with data sources and other tools, while Agent2Agent is focused on inter-agent communication. This is similar to the Agntcy project launched by Cisco, AI agent framework maker LangChain and evaluative AI vendor Galileo on March 6. LangChain is also among Google's 50 Agent2Agent protocol partners.

All these AI agent protocols are in their infancy. Still, according to one early adopter, the next stage of AI evolution will require better inter-agent orchestration among disparate tools.

"We have not explored how the GenAI tools are going to prioritize or promote answers. If three different tools have answered the same question, which one are they going to use?" said Kasia Wakarecy, vice president of enterprise data and apps at Pythian, a data and analytics services company that partners with Google and uses both Gemini and Atlassian's Rovo agents.

"If someone is asking sales questions, is Salesforce going to be promoted as the answer over a Slack message?" she said in an interview this week. "With enterprise applications, you can find this answer to the same question in five different places, and some will be outdated. So how, generally, will I be able to know if the source is true? … How do I ensure that GenAI knows what we know about our own systems?"

Beth Pariseau, senior news writer for Informa TechTarget, is an award-winning veteran of IT journalism covering DevOps. Have a tip? Email her or reach out @PariseauTT.

Dig Deeper on IT systems management and monitoring