Getty Images

GPU scarcity shifts focus to GPUaaS

High GPU costs and scarcity drive users to GPUaaS for AI workloads. But businesses should assess needs before investing.

GPUs have emerged as a key to running most high-end AI workloads. But high costs and low supplies are leading customers to seek out other means to access the processing units.

Several vendors -- from those specifically focused on GPUs as a service only, to public cloud providers to OEMs -- are looking at differing ways to grant this experience to their customers. One sign of a growing interest and a growing market is with the larger GPU as a service (GPUaaS) providers, such as CoreWeave and Lambda Labs. CoreWeave is looking to IPO in the coming months, and Lambda is launching AI grants -- both are aimed at expanding their core offerings.

Other iterations of GPUaaS from the likes of Lenovo and Rackspace have also been cropping up in the enterprise. But analysts see them more as ways to utilize scarce resources rather than as pure GPUaaS offerings.

While the concept dates to the 1970s, the name GPU wasn't introduced until 1994 by Sony to render 3D graphics for video games. In 2006, Nvidia introduced its parallel programming network, Cuda, which found its way into machine learning and AI use cases a short time later. Now, Nvidia is the main supplier of data center GPUs.

In 2022, ChatGPT, an AI chatbot that uses and generates natural language, was released by OpenAI, and it set off the generative AI craze. AI utilizes large datasets, which are handled more efficiently by GPUs' ability to process data in parallel rather than sequentially.

The rush to use data center GPUs for AI training created a supply-and-demand imbalance in the market, according to Chirag Dekate, an analyst at Gartner. Nvidia, for its part, will supply its primary customers first.

"The primary customer, at least in today's demand-driven and supply-constrained market, is actually hyperscalers," Dekate said.

Flavors of GPUaaS

For Dekate, GPUaaS is a cloud-based service that provides access to GPUs when needed. Vendors such as Lambda Labs and CoreWeave, which have a strategic relationship with Nvidia and tens of thousands of GPUs, offer such a service and primarily serve specific uses, including hyperscalers for bursting and model innovators such as Cohere or OpenAI.

Using these services comes with their own limitations, Dekate said. While GPUaaS can provide the compute, they don't necessarily offer a full technology stack for AI workloads.

When thinking about pure GPUaaS offerings, he said, "It's like looking at the night sky through a telescope. What you see is beautiful. What you see is a Nvidia-centric ecosystem, and that's beautiful. But you don't get the big picture. You miss out on the broader beauty of the night sky."

Hyperscalers, on the other hand, provide customers with a full stack of technology and services in addition to GPU access. But customers must operate within that ecosystem, he said.

Other vendors like Lenovo and Rackspace are also GPUaaS to describe their new offerings. In September, Lenovo added access to GPUs to its TruScale infrastructure-as-a-service offering, which offers metered GPU resources for AI model training. In November, Rackspace introduced its own GPUaaS to address the growing demand for AI workloads, with a unique auction approach where customers bid to win access to the limited resources.

But Dekate said what they're really doing is providing a private cloud-type service that includes access to GPUs from an existing data center, not accessing a GPU cloud such as what CoreWeave offers or providing GPU services outside of what the customer already has.

"They allow you to create a private cloud on-prem, GPU-type offerings or private AI offering," he said.

Still, Lenovo's service enables customers to access GPUs at a lower cost than purchasing the processing units outright, Fellows said. These customers also don't have to belong to particular cloud ecosystems or have particular knowledge of the larger GPUaaS providers.

Scarcity and cost

GPUaaS is an option, but users can still put GPUs on-premises if needed, for a cost, according to Russ Fellows, an analyst at The Futurum Group. Individual GPUs at retail prices are high, he said. Users should instead look at their usage.

"You really only need larger GPU clusters on smaller occasions, like if you're doing training or fine tuning," he said.

The question that [enterprises] should be asking is, 'What are these GPUs going to be used for? What is the ROI? What is the impact of these systems?'
Chirag DekateAnalyst, Gartner

Dekate agreed that enterprises need to consider the use cases before determining if an on-premises or an as-a-service option is best for them.

"The question that they should be asking is, 'What are these GPUs going to be used for? What is the ROI? What is the impact of these systems?" he asked.

GPU scarcity can create a fear of missing out mindset to procure GPUs, which can lead to further scarcity, Dekate said. Plus, most enterprises can use GenAI without large numbers of high-end GPUs. For example, small language models, which use natural language but fewer parameters than an LLM, can run on CPUs.

Customers can also underestimate how difficult it is to stand up and maintain a GPU cluster, according to Dekate. Doing so requires a specific skill set that might be uncommon in the enterprise. GPUs can have a high failure rate for several reasons, including component failures, memory failure and driver failures.

"For some of these, you can just restart your server node, and things will work back up. But if you have component failures, then you cannot." Dekate said.

After the hype

It won't always be difficult to get ahold of GPUs, Fellows said. The next few years might prove difficult, but the need for massive amounts of GPUs for new foundational LLMs trained on the same data will dissipate.

"It is just going to be redistribution," he said. "Those GPUs aren't going to not be needed."

Companies will need fewer GPUs for inferencing or small language models, Fellows said.

"Does everyone need 100 GPUs? No," he said, adding that, "A small company could probably get by on using one or two GPUs to run inferencing. If they keep getting more powerful, a mid-sized company could get by on a couple of eight-node clusters."

There will also be room for GPUaaS, he said, whether it is in the strict sense or the private cloud sense. It will depend on the size of the data and what is being done with it. Something like batching -- dividing large datasets down for better training -- doesn't need to be done in a continuous manner.

"GPU as a service could be useful for that part of the workflow," Fellows said.

Adam Armstrong is a TechTarget Editorial news writer covering file and block storage hardware and private clouds. He previously worked at StorageReview.

Dig Deeper on Data center hardware and strategy