putilov_denis - stock.adobe.com
DeepSeek explained: Everything you need to know
DeepSeek, a Chinese AI firm, is disrupting the industry with its low-cost, open source large language models, challenging U.S. tech giants.
Listen to this article. This audio was generated by AI.
In the world of AI, there has been a prevailing notion that developing leading-edge large language models requires significant technical and financial resources. That's one of the main reasons why the U.S. government pledged to support the $500 billion Stargate Project announced by President Donald Trump.
But Chinese AI development firm DeepSeek has disrupted that notion. On Jan. 20, 2025, DeepSeek released its R1 LLM at a fraction of the cost that other vendors incurred in their own developments. DeepSeek is also providing its R1 models under an open source license, enabling free use.
Within days of its release, the DeepSeek AI assistant -- a mobile app that provides a chatbot interface for DeepSeek R1 -- hit the top of Apple's App Store chart, outranking OpenAI's ChatGPT mobile app. The meteoric rise of DeepSeek in terms of usage and popularity triggered a stock market sell-off on Jan. 27, 2025, as investors cast doubt on the value of large AI vendors based in the U.S., including Nvidia. Microsoft, Meta Platforms, Oracle, Broadcom and other tech giants also saw significant drops as investors reassessed AI valuations.
What is DeepSeek?
DeepSeek is an AI development firm based in Hangzhou, China. The company was founded by Liang Wenfeng, a graduate of Zhejiang University, in May 2023. Wenfeng also co-founded High-Flyer, a China-based quantitative hedge fund that owns DeepSeek. Currently, DeepSeek operates as an independent AI research lab under the umbrella of High-Flyer. The full amount of funding and the valuation of DeepSeek have not been publicly disclosed.
DeepSeek focuses on developing open source LLMs. The company's first model was released in November 2023. The company has iterated multiple times on its core LLM and has built out several different variations. However, it wasn't until January 2025 after the release of its R1 reasoning model that the company became globally famous.
The company provides multiple services for its models, including a web interface, mobile application and API access.
OpenAI vs. DeepSeek
DeepSeek represents the latest challenge to OpenAI, which established itself as an industry leader with the debut of ChatGPT in 2022. OpenAI has helped push the generative AI industry forward with its GPT family of models, as well as its o1 class of reasoning models.
While the two companies are both developing generative AI LLMs, they have different approaches.
OpenAI | DeepSeek | |
Founding year | 2015 | 2023 |
Headquarters | San Francisco, Calif. | Hangzhou, China |
Development focus | Broad AI capabilities | Efficient, open source models |
Key models | GPT-4o, o1 | DeepSeek-V3, DeepSeek-R1 |
Specialized models | Dall-E (image generation), Whisper (speech recognition) |
DeepSeek Coder (coding), Janus Pro (vision model) |
API pricing (per million tokens) |
o1: $15 (input), $60 (output) | DeepSeek-R1: $0.55 (input), $2.19 (output) |
Open source policy | Limited | Mostly open source |
Training approach | Supervised and instruction-based fine-tuning | Reinforcement learning |
Development cost | Hundreds of millions of dollars for o1 (estimated) | Less than $6 million for DeepSeek-R1, according to the company |
Training innovations in DeepSeek
DeepSeek uses a different approach to train its R1 models than what is used by OpenAI. The training involved less time, fewer AI accelerators and less cost to develop. DeepSeek's aim is to achieve artificial general intelligence, and the company's advancements in reasoning capabilities represent significant progress in AI development.
In a research paper, DeepSeek outlines the multiple innovations it developed as part of the R1 model, including the following:
- Reinforcement learning. DeepSeek used a large-scale reinforcement learning approach focused on reasoning tasks.
- Reward engineering. Researchers developed a rule-based reward system for the model that outperforms neural reward models that are more commonly used. Reward engineering is the process of designing the incentive system that guides an AI model's learning during training.
- Distillation. Using efficient knowledge transfer techniques, DeepSeek researchers successfully compressed capabilities into models as small as 1.5 billion parameters.
- Emergent behavior network. DeepSeek's emergent behavior innovation is the discovery that complex reasoning patterns can develop naturally through reinforcement learning without explicitly programming them.
DeepSeek large language models
Since the company was created in 2023, DeepSeek has released a series of generative AI models. With each new generation, the company has worked to advance both the capabilities and performance of its models:
- DeepSeek Coder. Released in November 2023, this is the company's first open source model designed specifically for coding-related tasks.
- DeepSeek LLM. Released in December 2023, this is the first version of the company's general-purpose model.
- DeepSeek-V2. Released in May 2024, this is the second version of the company's LLM, focusing on strong performance and lower training costs.
- DeepSeek-Coder-V2. Released in July 2024, this is a 236 billion-parameter model offering a context window of 128,000 tokens, designed for complex coding challenges.
- DeepSeek-V3. Released in December 2024, DeepSeek-V3 uses a mixture-of-experts architecture, capable of handling a range of tasks. The model has 671 billion parameters with a context length of 128,000.
- DeepSeek-R1. Released in January 2025, this model is based on DeepSeek-V3 and is focused on advanced reasoning tasks directly competing with OpenAI's o1 model in performance, while maintaining a significantly lower cost structure. Like DeepSeek-V3, the model has 671 billion parameters with a context length of 128,000.
- Janus-Pro-7B. Released in January 2025, Janus-Pro-7B is a vision model that can understand and generate images.
Why it is raising alarms in the U.S.
The release of DeepSeek-R1 has raised alarms in the U.S., triggering concerns and a stock market sell-off in tech stocks. On Monday, Jan. 27, 2025, the Nasdaq Composite dropped by 3.4% at market opening, with Nvidia declining by 17% and losing approximately $600 billion in market capitalization.
DeepSeek is raising alarms in the U.S. for several reasons, including the following:
- Cost disruption. DeepSeek claims to have developed its R1 model for less than $6 million. The low-cost development threatens the business model of U.S. tech companies that have invested billions in AI. DeepSeek is also cheaper for users than OpenAI.
- Technical achievement despite restrictions. The export of the highest-performance AI accelerator and GPU chips from the U.S. is restricted to China. Yet, despite that, DeepSeek has demonstrated that leading-edge AI development is possible without access to the most advanced U.S. technology.
- Business model threat. In contrast with OpenAI, which is proprietary technology, DeepSeek is open source and free, challenging the revenue model of U.S. companies charging monthly fees for AI services.
- Geopolitical concerns. Being based in China, DeepSeek challenges U.S. technological dominance in AI. Tech investor Marc Andreessen called it AI's "Sputnik moment," comparing it to the Soviet Union's space race breakthrough in the 1950s.
DeepSeek cyberattack
DeepSeek's popularity has not gone unnoticed by cyberattackers.
On Jan. 27, 2025, DeepSeek reported large-scale malicious attacks on its services, forcing the company to temporarily limit new user registrations. The timing of the attack coincided with DeepSeek's AI assistant app overtaking ChatGPT as the top downloaded app on the Apple App Store.
Despite the attack, DeepSeek maintained service for existing users. The issue extended into Jan. 28, when the company reported it had identified the issue and deployed a fix.
DeepSeek has not specified the exact nature of the attack, though widespread speculation from public reports indicated it was some form of DDoS attack targeting its API and web chat platform.
Sean Michael Kerner is an IT consultant, technology enthusiast and tinkerer. He has pulled Token Ring, configured NetWare and been known to compile his own Linux kernel. He consults with industry and media organizations on technology issues.