Getty Images

Generative AI brings changes to cloud-native platforms

Generative AI took over tech in 2023, and cloud-native platforms are no exception. The need to support LLMs is already affecting CNCF projects, including Kubernetes.

CHICAGO -- Generative AI took center stage -- literally -- at this year's highest-profile cloud-native computing event.

The rapid ascent of generative AI (GenAI) was the focus of the first KubeCon + CloudNativeCon keynote presentation here this week, which addressed the ways enterprise generative AI adoption already draws on cloud-native platforms, and the way those platforms must change to better accommodate such workloads.

"With this scale of innovation, cloud native needs to keep evolving to be at the cutting edge," said Priyanka Sharma, executive director at the Cloud Native Computing Foundation (CNCF) during her keynote talk. "The stakes are high, because anyone innovating big GenAI applications is particularly sensitive about their data security and privacy. As a result, a lot of LLM stack vendors need to be able to deploy to customers VPCs [virtual private clouds] on their Kubernetes clusters … so that they feel more secure."

Cloud-native platforms, built to facilitate rapid software development and rapid absorption of new infrastructure utilities, already support generative AI apps and large language models (LLMs) at companies such as Adobe and Intuit.

"We have built something called GenOS, which applies constraints before our data goes to the LLM. … We have to make sure PII data is not going out," said Mukulika Kapas, director of product management for Intuit's Modern SaaS platform. "Also, with ChatGPT, we don't open it directly to developers, because initially people make mistakes. We have [platform] guardrails there, as well."

Generative AI puts a new twist on MLOps

Supporting high-performance computing (HPC) and other forms of AI such as machine learning also isn't new to the cloud-native or enterprise IT world. It's already an established discipline called MLOps, which has been practiced using cloud-native platform tools such as Kubeflow at organizations such as the European Organization for Nuclear Research (CERN) for years.

The Kubeflow project gave physicists that were training deep learning models access to distributed cloud infrastructure in Kubernetes clusters beginning in 2018, said Ricardo Rocha, computing engineer at CERN and a member of the CNCF Technical Oversight Committee, during a presentation this week. Kubeflow was created at Google and donated to CNCF this year.

"Because we were using this sort of technology, we had access to some of the public cloud resources quite easily so we started trying things like using GPUs as well," Rocha said. "Not only for scaling, but also to evaluate costs, and we could see which resources were actually better for us."

But generative AI has some unique characteristics as a workload that set it apart from past generations of HPC and AI, said Kevin Klues, a distinguished engineer at Nvidia, during a keynote panel presentation. GenAI is artificial intelligence that can be used to generate new content, including programming code.

"The conventional thinking has always been that for [AI model] training you need big, beefy GPUs, but for inference you need smaller GPUs or even a fraction of a GPU. With the introduction of LLMs, that's not even true," Klues said. "If you want to do inference on LLMs, you need these big, beefy GPUs. So just trying to find the right size GPU for your workload is a challenge in Kubernetes, amongst some of the other challenges of, once you do have access to it, how do you control sharing those GPUs across different workloads?"

GPUs have become a sought-after commodity in the age of cryptocurrency, and generative AI has made them an even scarcer resource. Ordering on-site GPUs can take months, and cloud GPUs can come at a huge pricing premium because of demand, according to Rocha's presentation.

GPUs also draw a large amount of electrical power in data centers, surpassing even cryptocurrency mining workloads that have already concerned environmentalists with the amount of carbon emissions they contribute to global climate change. While AI workloads comprise 8% of the power draw in data centers this year, they are expected to account for 15% to 20% by 2028, due in part to the difference in processor demand for inferencing in LLMs compared with other AI workloads, according to a Schneider Electric white paper referenced by Rocha in his presentation.

"When you start looking at the GPUs and their power use, you're talking bitcoin mining but bigger," said Marlow Weston, cloud software architect at Intel Corp. and chair of the CNCF Environmental Sustainability Technical Advisory Group, during the keynote panel presentation. "We need to find ways to optimize for power so that people running the data centers or people running locally are minimizing the amount of power usage and maybe just powering up [GPUs] just when they're using them."

KubeCon + CloudNativeCon 2023 keynote generative AI panel.
A generative AI panel onstage at KubeCon + CloudNativeCon 2023 this week included, from left, Tim Hockin, distinguished software engineer at Google; Marlow Weston, cloud software architect at Intel; Kevin Klues, a distinguished engineer at Nvidia; and Joseph Sandoval, principal product manager at Adobe.

Kubernetes management adjusts to generative AI

The more efficient use of GPUs, in terms of both electrical power use and shared cloud infrastructure resources, is now a top priority for Kubernetes, according to Tim Hockin, distinguished software engineer at Google and a Kubernetes maintainer.

"AI/ML workloads are a little different than the things that Kubernetes has built to support for the last 10 years," Hockin said during the keynote panel presentation. "It's really changing our relationship with hardware, and so we think hard about how we're going to manage that in scheduling and resource management and performance management."

Intel and Nvidia worked together on dynamic resource allocation, a new API for resource management introduced in Kubernetes 1.26 in late 2022. Dynamic resource allocation can enable organizations to more efficiently assign workloads to GPUs, including the ability to divide them among individual GPU cores rather than consuming the entire processor.

"It's becoming the new way of doing resource allocation in Kubernetes," Klues said during the panel presentation. "[But] there's lots of challenges with that. It's still in alpha form. And there's some cost that comes with doing things this way that we, as a community, need to kind of rally around and figure out how to solve."

Other cloud-native platform tools in areas such as observability are also still catching up to LLMs, according to Intuit's Kapas, in a separate interview here this week.

"LLM observability is still in early stages," she said. "With generative AI chat apps, we need to monitor the entire conversation to understand response quality and build a feedback loop. So we have to be very careful about storing that data with encryption for data privacy."

KubeCon + CloudNativeCon 2023 keynote media panel.
Participants at a keynote media panel at KubeCon + CloudNativeCon 2023 this week included, from left, Taylor Dolezal from CNCF, Mukulika Kapas from Intuit, Sky Grammas from Cruise, Angel Diaz from Discover Financial Services, Samith Gunasekara from Boeing and Mel Cone from The New York Times.

Cloud-native, open source orgs address AI safety

Officials at the Linux Foundation, CNCF and Open Source Security Foundation this week also addressed a broader discussion that has emerged during the growth of generative AI: that open source AI data sets and models present untenable safety risks.

"Our belief is that this is a form of regulatory capture by incumbents in that market," said Jim Zemlin, executive director of the Linux Foundation, during this week's KubeCon press conference. "In order to make sure that foundation models and AI technology is not used in a terrible way. You need three things: You need transparency, trust and attribution. And open source provides those things."

Open source LLM hub Hugging Face recently joined the PyTorch Foundation, an offshoot of the Linux Foundation focused on deep learning. The OpenSSF also plans to address AI model transparency, according to its new governing board chair, Arun Gupta, in a separate interview this week.

"Open in AI is a much bigger problem than just the open source part of it," Gupta said. "When you think about open in AI, you're thinking in terms of open source, open data, open model, open weight and infrastructure. … There are four fundamental freedoms of open source, but AI potentially adds a fifth element of verifiability."

There are four fundamental freedoms of open source, but AI potentially adds a fifth element, of verifiability.
Arun GuptaVice president and general manager, open ecosystem initiatives, Intel Corp; chair, CNCF and OpenSSF governing boards

Some enterprise IT presenters here this week appeared willing to explore open source AI.

"When we take from open source, we make it more relevant to the aerospace industry, adding on top what's needed to be specialized for the domain," said Samith Gunasekara, head of The Boeing Software Factory development, security and operations, during the media panel presentation. "I look at generative AI as the same narrative. There's going to be generally what I want, but not 100%. … We'll take a general model and [customize] it for our industry, and that's how we as an industry can move forward, creating better and better models."

In the meantime, LLM vendors such as Cohere are using tools from cloud-native platform projects such as OCI Registry as Storage (ORAS) to make LLM available for security-conscious customers' private Kubernetes clusters, according to a session presentation here this week.

"This is similar to when, as an industry, we started moving into the cloud. All the same reasons that people gave for, 'I can't move out of my collocated facility to the cloud' are the exact same reasons we're hearing for why 'I need to run a private LLM and don't want to use a SaaS system,'" said Autumn Moulder, director of infrastructure and security at Cohere, a generative AI vendor in Toronto. "But they also don't have the expertise to run a fully open source [LLM] stack."

ORAS lets Cohere deliver its LLM and associated application as a static file via an Open Container Initiative container registry, which users can easily import into their on-premises Kubernetes clusters, said Marwan Ahmed, a member of the technical staff at Cohere, who copresented with Moulder. It also addresses security concerns related to the integrity and provenance of those files.

"The main benefit is we're able to ship a solution really quickly. It turns out that all you really need to stand up a Kubernetes environment is a container registry, more or less," Ahmed said. "Container registries and registries in general are content-addressable. … [ORAS] provided a way to ensure the authenticity and integrity of the image objects, because every layer has its own digital fingerprint."

Beth Pariseau, senior news writer at TechTarget, is an award-winning veteran of IT journalism. She can be reached at [email protected] or on Twitter @PariseauTT.

Next Steps

AI, storage, infrastructure stand out at KubeCon Europe

Dig Deeper on DevOps