Amid an escalating global AI race, Nvidia revealed at its closely watched GTC developer conference on Tuesday how it's moving forward in reasoning and physical AI.
The AI hardware and software giant introduced a new family of open reasoning models for developers and enterprises, and an open and fully customizable reasoning model for physical AI development.
The open reasoning models and the Llama Nemotron family of models with reasoning capabilities are designed to provide developers and enterprises with the foundation for creating advanced AI agents.
Along with Llama Nemotron, Nvidia moved ahead in the emerging field of physical AI by introducing new Cosmos world foundation models, including Cosmos Reason, an open and fully customizable reasoning model for physical AI development. Physical AI integrates AI with machines, enabling them to interact with the physical world.
The vendor, the world's leading provider of advanced AI chips, also released Nvidia Dynamo, an open source inference software platform for accelerating and scaling AI reasoning models in AI factories. Nvidia defines AI factories as places where enterprises can gain value from AI , in which they orchestrate the entire AI lifecycle from data ingestion to training.
The move toward reasoning exemplifies a trend in the generative AI market that started last year with the Google 2.0 Flash and OpenAI o1 models. It intensified with Chinese AI startup's introduction of DeepSeek R1 in January.
Being efficient
While Nvidia is jumping into the reasoning movement in earnest, it said it is also working to make the models more efficient.
"There's a lot of innovation now happening in techniques to make these models more efficient," said Gartner analyst Arun Chandrasekaran.
One example of efficiency in Llama Nemotron models is the sizes of the models.
The models are available as Nvidia NIM microservices, in Nano, Super and Ultra sizes. The Nano model provides high accuracy for PCs and edge devices. The Super model provides accuracy and throughput on a single GPU, and the Ultra model will be available soon and provide maximum agentic accuracy on multi-GPU servers, according to Nvidia.
Another example of how Nvidia is pushing efficiency is the hybrid capabilities of the models. Users can reduce the number of tokens they use by turning off reasoning for questions that do not warrant it.
Most reasoning models use thousands of tokens to generate their answers. Therefore, the option to turn off reasoning is beneficial for users, said Kari Briski, vice president of generative AI software for enterprise at Nvidia.
"There are times you definitely need reasoning," Briski said in an interview with Informa TechTarget. "Reasoning is a game changer. And then there are times when you don't really need it, and being able to have a single, multi-task model to do both of those is important."
Nvidia is not the only AI vendor with a hybrid reasoning model. Anthropic introduced its hybrid reasoning model, Claude 3.7 Sonnet, last month.
Nvidia plans to release the synthetic data sets and the post-training techniques it used to train and develop the Llama Nemotron models so enterprises can use them to build their own custom reasoning models.
Nvidia CEO Jensen Huang introduces a new system for physical AI digital twins Tuesday at the company's GTC developer conference.
Making its mark with inferencing
Nvidia Dynamo is the successor to the Nvidia Triton Inference Server. It maximizes token revenue generation for AI factories deploying reasoning AI models.
"Inference is going to be one of the most important workloads in the next decade as we scale out AI," CEO Jensen Huang said during his GTC keynote.
Inference is going to be one of the most important workloads in the next decade as we scale out AI.
Jensen HuangCEO, Nvidia
Nvidia said the inference-serving software includes four components that reduce inferencing costs and improve user experience.
It has a GPU planner that adds and removes GPUs to adjust fluctuating user demand. It has an inference-optimized library that supports GPU-to-GPU communication. Its memory manager is an engine that offloads and reloads inference data to and from lower-cost memory and storage devices without affecting user experience, according to Nvidia. And the platform's Smart Router directs requests across GPUs to minimize GPU re-computation costs.
Dynamo is an example of how Nvidia is becoming better at inferencing, said Mike Gualtieri, an analyst at Forrester Research.
"Nvidia kind of got ahead of things this time because, for the longest time, they weren't that good at inferencing at all," Gualtieri said. "They were really solid at training."
Inferencing was an area in which Nvidia faced strong competition from vendors such as AMD and Fujitsu and hyperscalers like AWS, which has its own inferencing product, AWS Inferentia.
"If you calculate the market for inferencing, the market looks like it could be huge, and it's untapped," Gualtieri said. "I don't know if they want to win that, they can. But I'm also not sure they can win just by being a model training company."
The vendor will make Nvidia Dynamo available in Nvidia NIM microservices, supported by an upcoming release of Nvidia's AI Enterprise software platform.
Nvidia also introduced its AI Data Platform, a customizable reference design that providers can use to build AI infrastructure for demanding AI inference workloads.
Agentic AI
Nvidia's AI-Q Blueprint is a new system for developing agentic systems.
The new Nvidia AgentIQ toolkit powers AI-Q Blueprint, an open source platform for connecting, profiling and optimizing teams of AI agents, now available on GitHub.
With these new tools, Nvidia is showing customers that it's not just an infrastructure vendor, Chandrasekaran said.
"They're trying to tell customers, 'We can bring these models into your environment, and we build integration on the back end to simplify the deployment of the model,'" he said.
World foundation models and reasoning
Not only is Nvidia joining the reasoning movement, but it's also expanding reasoning with world foundation models.
The AI vendor introduced several new world foundation models (WFM), including Cosmos Reason. Cosmos Reason is an open, customizable WFM that uses chain-of-thought reasoning to understand video data and predict interaction outcomes in natural language.
Nvidia said the model can improve physical AI data annotation and curation. Developers can also post-train the model to build planners that tell machines in the physical world what they need to do to complete a task.
Nvidia also introduced Cosmos Transfer for synthetic data generation. Cosmo Transfer streamlines perception AI training, transforming 3D simulations created in Omniverse into photorealistic videos for large-scale synthetic data generation, Nvidia said.
The AI vendor also revealed that new Omniverse Blueprints are being installed on Nvidia Cosmos.
One of the new Omniverse Blueprints is Mega, a Blueprint for testing multi-robot fleets at scale in industrial digital twins.
Gualtieri said Nvidia is seeking to inspire its customers with its move toward reasoning, agentic AI and physical AI.
"They're doing a lot of this stuff just to demonstrate that they're pushing the edge," he said.
Esther Shittu is an Informa TechTarget news writer and podcast host covering artificial intelligence software and systems.