What’s new and not new with OpenAI’s latest reasoning models
The o3 and o4-mini models are supposed to be better than previous models. While they have improved abilities that can be seen as agentic, the upgrades are mostly incremental.
OpenAI introduced the next iteration of its reasoning models, which it says are its smartest and most capable models to date.
The new o3 and o4-mini models, released Wednesday, can perform tool calling and use and combine every tool within ChatGPT, the vendor's AI chatbot. It can also search the web and analyze uploaded files and other data with Python.
According to OpenAI, the new model versions -- which succeed the o1 generation -- are a step toward a more agentic ChatGPT capable of acting autonomously or semi-autonomously.
The models are trained on when and how to use the tools within ChatGPT.
The o3 model is adapted to handle complex questions that require in-depth analysis, with answers that are not obvious. It can analyze visuals such as images, charts and graphics. Meanwhile, o4-mini is optimized for fast, cost-efficient reasoning. It is good for math, coding and visual tasks.
Both models demonstrate improved instruction following and more useful responses than OpenAI's previous reasoning models, the vendor said.
The AI vendor said the models can also think with images. Users can upload a photo of a whiteboard or hand-drawn sketch, and the models can interpret it, even if the image is blurry.
The new iteration of OpenAI reasoning models comes a few days after the AI vendor launched GPT-4.1 while telling users it plans to turn off GPT-4.5. It also comes three months after the vendor introduced the o3-mini reasoning model.
Test-time reasoning and agentic AI
The new models use the same technology as other reasoning models, such as test-time reasoning, with a slight improvement.
Test-time reasoning is a technique that enables a model to think more and use different problem-solving skills instead of just regurgitating responses from the web and other data sources.
"Perhaps, if anything, what they're doing is setting expectations a little bit better for how long different questions or tasks will take with it," said Bradley Shimmin, an analyst at Futurum Group.
However, many model providers are doing similar things with their models, and OpenAI's moves are unsurprising, Shimmin continued.
In any case, the new image thinking capabilities of o3 and o4-mini are a significant step for foundation models meant to support agentic AI, said Lian Jye Su, an analyst at Omdia, a division of Informa TechTarget.
The o series models have always meant to be multimodal.
Lian Jye SuAnalyst, Omdia
"The o series models have always meant to be multimodal," Su said. "They're meant to be less text-based and on the image side. It's a significant thing more in the sense that ... they're almost like an agent because of their capability. As the models become more powerful, they will continue to improve, meaning they have almost the agentic capability."
He added that the "o" series models could be considered to have agentic capabilities because they differ from traditional models that follow a specific instruction given straightforwardly. Instead, the models and other multimodal models can be given more complex instructions or targets and can use problem-solving skills to answer them.
"I do expect the multimodal foundation models to have that capability as the model becomes smarter," Su said. "It doesn't mean they will completely replace [agents]; it just means [some]of the complex tasks that AI agents do, they can now fulfill as well."
On the other hand, the agentic capabilities that OpenAI highlighted in o3 and o4-mini, such as web searching and analyzing data, are not necessarily new and are a natural evolution of how AI models have been used, Shimmin said.
He added that the image capability of the o3 and o4-mini models is not "a new class of models."
"It's just saying the model is multimodal in that it can take in audio, visual, image, video and text, and that it can reason about how to work with those and do things with those because it's a model that features test-time reasoning," he continued.
Need for more openness
Meanwhile, as OpenAI continues to release new models that improve on its other models, the vendor leaves two things in its wake: a desire for an open model from OpenAI (since the vendor has not released an open model since GPT-2) and confusion about its ultimate destination.
"They seem to be trending in the right direction in terms of the model," Su said. "[However], enterprises may be more willing to embrace open source and open standards in the industry as the benefits appear and the cost comes down."
"Companies that are not truly open source already have a knock against them a bit," Shimmin said.
While enterprises may feel intrigued by OpenAI's models, they will continue to experiment with and build proof of concepts with them. But the back and forth that OpenAI tends to use with its models, such as turning off 4.5 and launching successive model generations with similar names in short periods of time, could be confusing for enterprises.
"I don't think that what they're doing is undermining trust so much as making it harder for companies to take advantage of the new features that they're bringing to market because of that confusion," Shimmin said.
OpenAI also shared a new experiment called Codex CLI. This coding agent can run from a user's terminal and maximize the reasoning capabilities of o3 and o4-mini. The vendor is also launching a $1 million initiative to support projects using Codex CLI and OpenAI models.
The o3 and o4-mini models are now available for ChatGPT Plus, Pro and Team users. ChatGPT Enterprise and Edu users will gain access in a week, the vendor said.
Esther Shittu is an Informa TechTarget news writer and podcast host covering artificial intelligence software and systems.