your123 - stock.adobe.com
Alibaba, DeepSeek and the shift in the AI market
The Chinese tech giant's model release a day before the Lunar New Year shows its competitiveness against AI startup DeepSeek. It also signals a shift in innovation.
Chinese tech giant Alibaba is also feeling pressure after Chinese startup DeepSeek released an AI model that triggered an uproar in the West. DeepSeek claimed to have trained the model, comparable in capabilities to advanced Western models, at a fraction of the cost and with far fewer AI chips.
Alibaba released a new AI language model, Qwen 2.5-Max, Tuesday. The release came a day before the Lunar New Year, when the country's economy traditionally shuts down for 15 days
Qwen 2.5-Max is a mixture of expert (MoE) model pretrained on 20 trillion tokens and post-trained with curated supervised fine-tuning and reinforcement learning from human feedback.
MoE is a technique in which a model is structured with multiple "minds" and each mind is compartmentalized so that whenever there is a query, the model uses adaptive routing to go to the specific mind, or region, that has the answer. For example, if a model is geared toward coding, the model routes queries to that mind.
MoE allows a model to be trained with less compute, so training can be faster and more cost-efficient. Other AI vendors, such as France-based Mistral AI, have also used this technique.
Pressure in China
Qwen 2.5-Max is not comparable to the DeepSeek R1 model that caused a global selloff of AI companies' stock after its release Jan. 20. However it is like DeepSeek-V3, another MoE model released earlier this month.
Alibaba's release shows the threat the tech giant -- the world's fourth-ranked public cloud vendor in terms of market share -- and other Chinese tech vendors feel regarding the startup. Following the DeepSeek R1 release, TikTok owner ByteDance also released an update to its AI model.
Moreover, Chinese tech giants engaged in a price war with DeepSeek last year after the AI startup released V2 for a user cost of only 1 yuan, or $0.14, per million tokens. By comparison, OpenAI's GPT-4 model's lowest-price tier costs $10 per million tokens.
The timing of the Alibaba and ByteDance releases shows that DeepSeek has spurred bigger AI technology vendors to launch their products quicker than they originally planned.
"We know that Alibaba's cloud unit has been voraciously beefing up its AI technology, but I think this underscores the immense pressure put on all AI companies in the wake of DeepSeek's spectacular rise," said Lisa Martin, an analyst at Futurum Group.
A shift in the AI market
The competitive edge that DeepSeek brings also reflects a change in the AI market.
Arun ChandrasekaranAnalyst, Gartner
"The progress around building leaner and more powerful models continues," said Arun Chandrasekaran, a Gartner analyst. "We will see a lot more innovation in algorithmic and the software layer, in terms of building more efficient models that run on constrained infrastructure, and that are also more price competitive from an inferencing API standpoint."
The apparently distinct innovations work together and are not standalone, Chandrasekaran said.
"It's almost like one model company is building on top of the other," he continued. "These model companies are becoming very, very good at reverse engineering these techniques and then quickly improving on those techniques to do something bigger, better, cheaper and smaller."
The innovation shows that what was previously thought about model training and inferencing has changed, said Bradley Shimmin, an Omdia analyst. The AI market has shifted to the degree that massive costs the market previously understood were needed to build a big AI model is no longer the case. GPT-4 cost upward of $100 million to train, according to CEO Sam Altman, while DeepSeek said it spent about $6 million to build R1.
"We've spent the last almost three years now really trying to optimize how transformers function, and these are the gains that you're seeing right now," Shimmin said. "There are a number of these now that are showcasing just how efficient we've been able to push these basic machine learning ideas that we've had and been working with for the last 60 years."
Competition and data
DeepSeek is an example of how fast vendors can now innovate and refine generative AI technology, Shimmin said. With R1, the AI vendor used distillation, in which a larger model is used to teach a smaller model so that it can work similarly. The company also built its models on top of social media giant Meta's Llama family of open models so the models are under a billion parameters in size and can run on a laptop.
However, innovation comes at a price, and while it appears the AI vendors are accelerating side by side to release new AI models, a competitive edge still exists for the established generative AI vendors.
The chief competitor to not only DeepSeek, but also Alibaba, is OpenAI. Reports surfaced Tuesday that OpenAI and its principal investor Microsoft are looking into evidence that DeepSeek distilled or took its data from its models without permission.
However, OpenAI has also been accused of using data from others without permission, which is why some enterprises prefer to use the vendor's technology through Microsoft Azure instead of directly.
The technique of providing the technology through a safety buffer should be the approach for all AI model providers, Shimmin said.
Options such as DuckDuckGo and Brave for consumers create an anonymous relay so that models like Qwen 2.5 or DeepSeek R-1 can be used without users giving access to personal data, he said.
For enterprises, Alibaba's Qwen2.5 series is licensed under the Apache 2.0 license, while DeepSeek is licensed under an MIT license.
"Those two licenses are extremely permissive, and they give you the opportunity as a business to vet the code itself, not just the weights, but the code itself, and to ensure that what you're seeing is what you're getting, and you're not going to be opening yourself up to any kind of breach of privacy or security," Shimmin said.
Esther Shittu is an Informa TechTarget news writer and podcast host covering artificial intelligence software and system