Getty Images/iStockphoto
Nvidia's new model aims to move GenAI to physical world
Nvidia Cosmos is trained on hours of videos that focus on the physical world. It works with Nvidia Omniverse and advances LLM technology, the vendor says.
AI hardware-software giant Nvidia introduced what the vendor is calling the next wave of generative AI technology: world foundation models.
During a keynote at the CES show in Las Vegas on Jan. 6, Nvidia CEO Jensen Huang unveiled world foundation models, a new family of open models to advance agentic AI, a new Agentic AI Blueprint system, updates to the Omniverse and a personal AI supercomputer.
Nvidia Cosmos
The world foundation model, dubbed Nvidia Cosmos, differs from large language models in that it was trained to understand the physical world, Huang said.
Instead of understanding texts and language, the model was trained on 20 million hours of video that focus on physical things. The video footage focused on processes such as humans walking, hands moving and manipulating objects.
"It's really about teaching the AI -- not about generating and creating content, but teaching the AI to understand the physical world," Huang said.
The benefit is the ability to generate synthetic data to train models, which could be useful for industries like transportation and supply chain managemen and applications such as autonomous vehicles that need synthetic data in the absence of physical data.
Developers can also use Cosmos to generate reinforcement learning from AI feedback to improve models or test and validate model performance instead of using human feedback. Cosmos can generate tokens in real time and train robots, large language models and multimodal models.
Nvidia Cosmos is available now in open license on GitHub.
The world model comes in three sizes: Cosmos Nano, Cosmos Super and Cosmos Ultra. Cosmos Nano is for low-latency real-time models. Cosmos Super is for models that are for out-of-the-box fine-tuning and deployment. Meanwhile, Cosmos Ultra is for maximum accuracy and policy. It provides the most advanced knowledge transfer for custom models.
"Cosmos World foundation models being open, we really hope will do for the world of robotics and industrial AI what Llama 3 has done for enterprise AI," Huang said, referring to Meta's popular open foundation model.
From digital AI to physical AI
Cosmos helps bring generative AI from the digital world into the physical world, said Bob O'Donnell, founder and analyst at Techanalysis Research.
"It's moving AI from being more of a digital phenomenon to being a physical phenomenon," O'Donnell said. "It's another layer of software that allows developers to create and people to leverage these GPUs for physical actions."
Cosmos also enables enterprises to build and customize models at scale, Gartner analyst Chirag Dekate said.
"This is about enabling enterprises to build and scale robotic AI and autonomous vehicle AI more effectively," Dekate said. "That is where you're likely going to see some first-generation real-world impact of GenAI manifest."
Nvidia Cosmos and Omniverse
What gives Cosmos its physical AI capabilities is Nvidia Omniverse.
Nvidia Omniverse is the vendor's AI platform for building digital twins and digital simulations. Cosmos is an additional layer on top of Omniverse. Cosmos helps developers better scale and design the digital simulations and digital twins they create in Omniverse.
"The combination of the two gives you a physically simulated, physically grounded multiverse generator," Huang said.
Cosmos with the underlying power of the Omniverse shows the next opportunity for Nvidia, especially in industries in which the vendor has dabbled but hasn't made a strong impact, such as autonomous vehicles, robotics and AI PCs, Dan Newman, CEO of the Futurum Group said.
"Here, Nvidia has the opportunity to basically offer all the parts and pieces to enable companies like Tesla to build humanoid robots, to be able to have them do real-world tasks," Newman said.
While existing robotic mechanisms can perform physical tasks in industrial settings, for example, with physical AI, Nvidia says the technology will enable humanoid robots to use computer vision more effectively, perform more autonomously and be capable of being trained more easily.
"We do this in a limited capacity, but scale has been very hard, because training has been very hard," Newman said. "But now building with these models is the ability for this to all be done in a physical world, whereas everything up to now has been mostly done in like this digital sort of AI."
The challenge Nvidia faces with Cosmos is enterprises understanding the technology, O'Donnell said.
"It's a hard concept for people to understand," he said.
Moreover, the use cases appear limited to the physical world and the world of autonomous vehicles and robotics, Dekate said.
"It seems narrow, but if you want to fast forward to the future ... in an AI-native industrial space, chances are these robotic scenarios are going to be far more pervasive than we see them today," he said.
Other than Nvidia Cosmos, the AI vendor introduced Mega, as a new tool in Omniverse Blueprint for developing, testing and optimizing physical AI and robot fleets at scale in a digital twin.
New agentic AI offerings
Nvidia also launched Agentic AI Blueprints. These enable enterprise developers to build and deploy custom AI agents. The new blueprints are integrated with Nvidia AI enterprise software including NIM microservices and Nvidia NeMo Retriever.
"What enterprises will like is that the Agentic Blueprints fit into Nvidia's enterprise software stack, therefore simplicity," said Patrick Moorhead, an analyst with Moor Insights and Strategy. "This is a big disadvantage for companies like Intel and AMD. Enterprises are currently willing to accept the Nvidia software lock-in in exchange for time to market."
Agentic AI Blueprints is an expansion of Nvidia's platform evolving from language and generative models to agents that can perform multi-directional work streams, in which separate teams or individuals within a project handle tasks that don't depend directly on each other, Newman said.
"This is a democratization of more software," he said. "[It] creates a stronger relationship between developers and Nvidia that are building agented capabilities for their customers."
While up to now scenarios for AI agents and agentic workflows has centered around performing tasks -- something Salesforce's Agentforce is built to do, for example -- enterprises need tools that can orchestrate, manage and monitor these agents.
Beyond Agentic Blueprints, Nvidia introduced Llama Nemotron, a family of open models that developers can use to create and deploy AI agents across different applications such as customer support, fraud detection and supply chain management. The models were built with Llama foundation models and are available in three sizes. Nano is for deployment on PCs and edge devices. Super offers throughput on a single GPU. Ultra is designed for data center-scale applications.
Llama Nemotron is a NIM microservice for Nvidia RTX AI PCs and workstations. It will perform agentic AI tasks like following instructions, coding, chatting and math. It's unclear when it will be available.
AI PC and supercomputer
Chirag DekateAnalyst, Gartner
The AI vendor also previewed Project R2X, a vision-enabled PC avatar, or digital human interface on a PC that can assist with desktop apps, video conference calls, and reading and summarizing documents.
In addition, Nvidia revealed that NIM microservices are available to PC users through AI Blueprints. These blueprints let users create podcasts from PDF documents and generate images. It uses the independent AI vendor's Mistral Nemo 12B Instruct model for language and the Nvidia Riva model for text-to-speech. The NIM microservices feature and AI Blueprints will be available in February.
Nvidia also plans to launch a personal AI supercomputer in May. Project Digits provides AI researchers, data scientists and students access to Nvidia's new GB10 Grace Blackwell Superchip. Users can develop and run inference on models on their desktop PCs as they would on the cloud or in a data center, Nvidia said.
Esther Shittu is an Informa TechTarget news writer and podcast host covering artificial intelligence software and systems.