your123 - stock.adobe.com

Cerebras' inference AI tool challenges Nvidia, but faces hurdles

The hardware vendor says its new inference offering outperforms Nvidia's GPU-based tools. However, Nvidia dominates the market.

AI hardware startup Cerebras Systems' new AI inference tool could challenge Nvidia's GPU offerings, but the vendor faces many hurdles in winning over enterprises.

On Tuesday, the AI vendor introduced Cerebras Inference, a new product that delivers 1,800 tokens per second for Llama 3.1 8B and 450 tokens per second for Llama 3.1 70B. Cerebras Inference is faster than Nvidia's GPU-based hyperscale cloud, Cerebras said.

It is powered by Cerebras' Wafer-Scale Engine and costs less than GPU-based offerings, the AI vendor said.

Change in the market

Cerebras Inference shows the change in the generative AI market, according to Arun Chandrasekaran, an analyst at Gartner.

In the initial stage of the generative AI hype, there was a lot of emphasis on training. Now, the market is shifting toward the cost and performance of inferencing, he said.

"It is also a sign that AI use cases are starting to proliferate and expand into the enterprise," Chandrasekaran said. "Which is why the innovation is not just happening in the training aspect of it. It's happening in the inferencing aspect of it."

As GenAI use cases grow in the enterprise, the performance of inferencing is becoming more important, providing an opportunity for vendors such as Cerebras, Chandrasekaran said. However, the opportunity is also for specialized cloud providers starting to rise and build intrinsic chips, while offering open source models on top of the chips.

Therefore, while Cerebras can differentiate itself based on performance and might be able to outperform even Nvidia, it will also have to compete against others such as hyperscalers like Microsoft, AWS and Google, and specialized inferencing providers like Groq, which recently raised $640 million.

Cerebras vs. Nvidia

While Cerebras seems to have come up with "a more efficient, more elegant way to deliver the performance from a hardware perspective and engineering perspective," Nvidia's software and hardware stack dominates the market and is easy to use for enterprises, Futurum Group analyst David Nicholson said.

Cerebras' wafer-scale system can deliver the performance needed for AI workloads at a much higher performance level and lower cost than Nvidia can, he added.

The question is the broader ecosystem. Are people willing to engineer what they need to do so that it will work with the Cerebras system?
David NicholsonAnalyst, Futurum Group

"The question is the broader ecosystem," Nicholson continued. "Are people willing to engineer what they need to do so that it will work with the Cerebras system?"

Many enterprises might find themselves able to get better performance and cost if they work with the Cerebras system than the off-the-shelf systems Nvidia provides, he said.

"The real question is ... how much of the market will gravitate toward the best way to do this, versus the most widely adopted, easiest to deploy?" he added. "Cerebras has a very large barrier to entry here, where Nvidia has such a dominant market share."

Thus, enterprises will likely choose between Nvidia and a vendor like Cerebras based on scale, Nicholson said. A small enterprise will likely lean toward Nvidia, while a vendor with large capital looking to scale its AI workflows might lean toward Cerebras.

Cerebras Inference is now available via chat and API access.

Esther Ajao is a TechTarget Editorial news writer and podcast host covering artificial intelligence software and systems.

Dig Deeper on AI infrastructure