Getty Images/iStockphoto

Examining hype about Chinese AI startup DeepSeek

The vendor released a new reasoning model it claims it developed cheaply in part by not using as many Nvidia chips. It will likely face challenges in the U.S. market.

A Chinese AI vendor's new large language model is making technology vendors in the U.S. rethink the development and training of generative AI reasoning models.

On Jan. 20, DeepSeek introduced its first generation of reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. Since its release, DeepSeek's AI assistant has taken the top spot from OpenAI's ChatGPT as the most downloaded free app on iOS.

The new LLM's immediate worldwide popularity sent AI chipmakers' stocks, particularly those of AI chip giant Nvidia, plummeting as tech investors lost confidence in U.S. AI vendors. Nvidia lost 17% of its value Monday, wiping out $589 billion of its market capitalization, while the tech-heavy Nasdaq composite dropped 3%.

DeepSeek-R1-Zero is a model trained with reinforcement learning, a type of machine learning that trains an AI system to perform a desired action by punishing undesired ones. DeepSeek-R1 is a version of DeepSeek-R1-Zero with better readability and language mixing capabilities, according to the AI startup.

Reasoning and open source

DeepSeek-R1 is comparable to OpenAI o1 models in performing reasoning tasks, the startup said. Both DeepSeek models have 671 billion parameters.

The models were released as open source, continuing the interplay between open source and closed source models. Meta's Llama family of open models has become widely popular as enterprises look to fine-tune models to use with their own private data, and that popularity has spawned increasing demand for open source generative AI systems.

Founded in 2023, DeepSeek achieved innovative success out of its need to find solutions to the infrastructure problem imposed on Chinese companies by the U.S. government's restriction of Chinese access to top AI chips.

Given the hardware restrictions, DeepSeek's achievement in inexpensively building an open source model that performs well compared to established models from big AI vendors in reasoning techniques is impressive, Gartner analyst Arun Chandrasekaran said.

"The conventional thinking was that LLMs are getting commoditized, so the future is building more reasoning models," he said.

Nobody saw a Chinese company actually coming up with a ... reasoning model.
Arun ChandrasekaranAnalyst, Gartner

In line with that trend, Google in December introduced Gemini 2.0, which included reasoning capabilities. The models in the OpenAI o1 series have also been trained with reinforcement learning to perform complex reasoning.

Despite prominent vendors introducing reasoning models, it was expected that few vendors could build that class of models, Chandrasekaran said.

"Nobody saw a Chinese company actually coming up with a ... reasoning model," he said. "That in itself is really noteworthy."

DeepSeek's ability to also use various models and techniques to take any LLM and turn it into a reasoning model is also innovative, Futurum Group analyst Nick Patience said.

The excitement about DeepSeek also comes from a need for the AI models to consume less power and cost less to run, said Mark Beccue, an analyst at Enterprise Strategy Group, now part of Omdia.

DeepSeek said it trained its latest model for two months at a cost of less than $6 million. By comparison, the cost to train OpenAI's biggest model, GPT-4, was about $100 million.

"Models must become cheaper to run, and they must become more accurate in order for GenAI to scale for enterprise," Beccue said. "In terms of running cheaper, model makers, the chipmakers, and other hardware manufacturers and data center players all know this and are working toward that goal. It's up to the model makers to deliver more accurate AI responses."

Constraints and innovations

But some observers are skeptical that the vendor performed inferencing and training of its model as cheaply as the startup -- which originated as a hedge fund firm -- claims, Chandrasekaran said.

"One theory is that constraints often create innovations," he said, adding that DeepSeek's lack of access to GPUs could have forced the vendor to create an innovative technology without accruing the cost of modern, expensive GPUs. "The other way to think about [DeepSeek] is that we don't know the infrastructure that it is trained on."

DeepSeek is not the only AI vendor or technology company in China that could turn limitations into innovation, Patience said.

"When you put these constraints on a country so large with much understanding of how to build electronics ... you could see them eventually getting to the stage where they're going to be building their own GPU competitors," he said. "That's a long way out, but I suspect that will happen."

Some challenges

Despite the public attention on DeepSeek and its well-performing reasoning model, the likelihood that it can compete long-term against the likes of dominant generative AI players OpenAI, Nvidia and Google is slim, Patience added.

For one, DeepSeek could face restrictions in the U.S. market placed on businesses that want to work with Chinese companies.

Another challenge is sustainability, Chandrasekaran said. While the vendor is basking in the public eye at the moment, the fast-moving AI market could relegate the vendor to the sidelines within a few weeks to a few months.

It's also unclear if DeepSeek can continue building lean, high-performance models.

"Do they have the motivation to do that? Second, do they have the money to do it? And third, can they actually monetize what they've done?" Chandrasekaran said. The AI vendor will face challenges in convincing cloud providers to take their model and offer it as a service or even build a developer ecosystem for their model, he added.

Meanwhile, DeepSeek could try to monetize its currently free service by selling API services soon.

The vendor also recently faced a security challenge. Over the last few days, it was hit with malicious cyberattacks, which caused it to limit user registration.

Despite the challenges it is bound to face in the U.S. market, DeepSeek has piqued the interest of the tech market in the U.S.

"The bigger theme here is really about building highly capable lean models," Chandrasekaran said. "There are certainly lessons that research labs and other ecosystems in the U.S. can learn from this."

In response to DeepSeek, Nvidia sent the following statement to media outlets, implying that companies like DeepSeek will need more Nvidia GPUs:

DeepSeek is an excellent AI advancement and a perfect example of test-time scaling. DeepSeek's work illustrates how new models can be created using that technique, leveraging widely available models and compute that is fully export control compliant. Inference requires significant numbers of Nvidia GPUs and high-performance networking. We now have three scaling laws: pre-training and post-training, which continue, and new test-time scaling.

Test-time scaling enables smaller models to achieve better performance during inferencing.

Esther Shittu is an Informa TechTarget news writer and podcast host covering artificial intelligence software and systems.

Next Steps

DeepSeek's AI breakthrough challenges Nvidia's chip dominance

Dig Deeper on AI technologies