KOHb - Getty Images
Google Gemini 2.0 explained: Everything you need to know
Google Gemini 2.0, announced in December 2024, will enhance generative AI with multimodal capabilities and agentic AI features, integrating across Google services.
Google, among the leading model builders in the generative AI world, debuted Gemini 2.0 -- the current iteration of its flagship Gemini family -- in December 2024. Gemini 2.0 focuses on agentic AI in multistage large language model workflows.
Google's Gemini family, which, in many respects, succeeds the earlier Pathways Language Model family of LLMs, debuted with Gemini 1.0 in December 2023. Gemini 1.5 Pro debuted in February 2024 and was expanded throughout the year, leading to the embrace of agentic AI with Gemini 2.0.
Agentic AI moves LLMs beyond simple operations with more automation and the ability to chain together multiple models in pursuit of a result. Agentic AI systems are also typically capable of calling external functions, such as sending an email or issuing a payment as part of a workflow.
With Gemini 2.0, Google directly competes with the OpenAI o1 reasoning model. For example, one Google model variant, Gemini 2.0 Flash Thinking, is the first Gemini version with the ability to think and reason as o1 does.
What is Gemini 2.0?
Gemini 2.0 is a portfolio of LLMs developed by Google as the company's leading generative AI technology.
In contrast to prior Gemini releases, the first preview of Gemini 2.0 was the Gemini 2.0 Flash model. Previous debuts featured base models, followed by a flash model -- a lighter, optimized and cost-efficient variant of an LLM.
On Dec. 19, 2024, Google announced Gemini 2.0 Flash Thinking, an experimental model variant with more reasoning capabilities than the base Gemini 2.0 Flash model.
Overall, Gemini 2.0 is a multimodal LLM that delivers comprehensive generative AI capabilities, including content generation, summarization and data analysis of text images, audio and video.
What's new in Gemini 2.0?
Gemini 2.0 continues the Google Gemini LLM family evolution from versions 1.0 and 1.5, introducing several enhancements, including the following:
- Multimodal output. Unlike Google's prior LLMs, Gemini 2.0 can generate content in a multimodal fashion, including images, text and audio. The multilingual native audio output capability initially provides eight distinct voices with various accents across multiple languages. The native image output capabilities let users generate highly customized images.
- Agentic AI enablement. Gemini 2.0 provides multimodal understanding, coding, function calling and the ability to follow complex instructions, underpinning better agentic experiences. The model understands more of the world, thinks multiple steps ahead and acts on behalf of users with their supervision.
- Native tool use. One characteristic of agentic AI is calling external functions. With Gemini 2.0, Google enables a specific external function that empowers the LLM to use native tools, such as Google Search and Google Maps, as part of an LLM query or agentic AI workflow.
- Multimodal live API. Gemini 2.0 introduces a live API to the LLM, enabling developers to integrate streaming data -- audio and video from user screens or cameras, perhaps -- to generative AI outputs.
Gemini 2.0 Flash
The first iteration of Gemini 2.0 that Google made available, Gemini 2.0 Flash is an experimental version and successor to the Gemini 1.5 Flash model, though it outperforms the Gemini 1.5 Pro model as well.
Gemini 2.0 Flash's key attributes include the following:
- Improved speed. Gemini 2.0 Flash's creators report it is twice as fast as its predecessor, Gemini 1.5 Flash. In particular, Google claims significantly improved time to the first token.
- High quality. Speed improvements in an LLM often come with a drop-off in model quality and accuracy, but that's not the case with Gemini 2.0 Flash, according to the company. Google also claims the model maintains quality comparable to the slower Gemini 1.5 Pro.
- Enhanced performance. Gemini 2.0 Flash performs better across multiple benchmarks, including Massive Multitask Language Understanding Pro, which evaluates responses among several subjects; HiddenMath, which provides competition-level math problems; and Natural2Code for code generation.
- Energy efficiency. The enhanced performance of Gemini 2.0 Flash couples with improved energy efficiency, potentially leading to better battery life on mobile devices.
How does Gemini enhance Google?
Google continues to increase the integration of generative AI capabilities across its products and services. Those AI capabilities are powered by LLMs, a trend that persists with the Gemini 2.0 model.
Gemini 2.0 is set to boost various Google products and services, among them:
- Google Search. Google added AI to search with its AI Overviews, formerly Search Generative Experience, for comprehensive answers. Gemini 2.0 promises more power and new multimodal capabilities for AI Overviews.
- Google Workspace. The Gemini LLM is already integrated into Google Workspace applications, including Docs, Slides and Meet. Gemini 2.0 enhances and extends those capabilities.
- Android devices. Specifically with its flagship Pixel smartphones, Google highlights its on-device LLM integration aspirations.
- Google AI Studio. For developers, Gemini 2.0 enables the creation of advanced multimodal and agentic AI applications with Google AI Studio and Vertex AI.
Agentic experiences
Gemini 2.0's focus on agentic AI powers the model to understand complex scenarios, plan multiple steps ahead and take actions on behalf of users.
Google further explores agentic AI capabilities in experimental efforts providing various agentic experiences. Among the experiences Google has publicly announced are the following:
- Project Astra universal AI assistant. Google is using Gemini 2.0 to build a new agentic AI-powered assistant providing integrated tool use with Google Search, Lens and Maps. It also supports real-time conversations in multiple languages with better accent understanding.
- Project Mariner browser-based agent. This agentic experience understands and reasons across browser screen information, navigating web interfaces through a Google Chrome extension. It also can type, scroll or click in active browser tabs.
- Jules developer code agent. Google is also previewing an agentic AI coding experience to integrate directly with GitHub workflows. The agent develops and executes a plan for code changes, along with resolving coding issues independently.
- Gaming agents. Google is also experimenting with Gemini 2.0-powered agents to provide real-time game analysis and suggestions. These agents understand rules and provide real-time conversations about gameplay. Initial Google testing includes a handful of games, such as Supercell's Clash of Clans and Hay Day.
Will Gemini 2.0 integrate with other platforms?
As with prior versions of Gemini, the LLM is not limited to Google's platform.
The Google AI Studio and Vertex AI services enable developers to build applications deployable anywhere. Additionally, Gemini 2.0's native tool use capability suggests integration with various third-party applications and APIs, though Google has not provided a detailed list of those integrations as of January 2025.
When will Gemini 2.0 be available and what are its costs?
Gemini 2.0 Flash is currently available as an experimental model to developers via the Gemini API in Google AI Studio and Vertex AI. General availability is expected by the end of January 2025, as are additional model variants.
Gemini 2.0 is also available in the Gemini app. Gemini users globally have access to a chat-optimized version of 2.0 Flash experimental as an option.
Pricing details haven't been announced. Google AI Studio offers limited free access in the experimental stage.
Sean Michael Kerner is an IT consultant, technology enthusiast and tinkerer. He has pulled Token Ring, configured NetWare and been known to compile his own Linux kernel. He consults with industry and media organizations on technology issues.