Gemini vs. ChatGPT: What's the difference? What is chain-of-thought prompting (CoT)? Examples and benefits
X
Definition

What is Gemma? Google's open sourced AI model explained

Gemma is a collection of lightweight open source generative AI (GenAI) models. Gemma was created by the Google DeepMind research lab that also developed closed source Gemini, Google's generative AI chatbots. Google makes Gemma available in several sizes and for use with popular developer tools and Google Cloud services.

The name Gemma comes from the Latin word for precious stone. Google released Gemma on Feb. 21, 2024, with two models: Gemma 2B and Gemma 7B.

Gemma has had multiple versions and iterations. Google first released Gemma on Feb. 21, 2024, with two models: Gemma 2B and Gemma 7B.   The second release followed a few months later, on June 27, 2024, with Gemma 2 in 9B and 27B, followed on July 31, 2024, with a 2B variant. Gemma 3 debuted on March 10, 2025, with 1B, 4B, 12B and 27B variants.

The original Gemma releases were not as large and powerful as popular AI models, such as OpenAI's GPT-4 and Google's Gemini Ultra and Pro chatbots. However, Gemma's compact lightweight models can run on laptop or desktop computers because they have faster inference speeds and lower computational demands. With the debut of Gemma 3, Google claimed that it could outperform larger open source models, including DeepSeek-V3 and Llama3 405B.

Gemma also runs on mobile devices and public clouds. Nvidia worked with Google to optimize Gemma to run on its graphics processing units (GPUs). Because of this wide support for platforms and hardware, Gemma can run on GPUs, central processing units or Google Clouds' Tensor Processing Units (TPUs).

While the models are open source, Google allows commercial usage and distribution of Gemma.

How is Gemma different from other AI models?

Gemma has several distinct differences from popular AI chatbots, including Google's Gemini. Gemma stands out for being open and lightweight. Gemini and OpenAI's GPT family of models used in ChatGPT are closed models, and neither is lightweight enough to run on laptops. Because ChatGPT and Gemini are closed, developers cannot customize their code as they can with the open source Gemma.

Gemma is not Google's first open AI model, but it is more advanced in its training and performance compared to older models Bert and T5. OpenAI, the developer of ChatGPT, has yet to release any open source models.

Google also has pretrained and instruction-tuned Gemma models to run on laptops and workstations. Similar to Gemma, Meta's Llama family of LLMs are open source AI models that can potentially run locally on laptops. Llama models are widely available to developers through Hugging Face and other platforms. 

Open source AI models have become increasingly popular over time. Other open source AI models include DeepSeek, Ai2's Tulu, IBM Granite, Mistral AI, Qwen, Falcon 180B, Bloom, Databricks Dolly and Cerebras-GPT. 

What is Gemma used for?

Developers can use Gemma to build their own AI applications, such as chatbots, text summarization tools and other retrieval-augmented generation applications. Because it is lightweight, Gemma is a good fit for real-time GenAI applications that require low latency, such as streaming text.

As of Gemma 3, the models also have multimodal capabilities, enabling users to analyze images and videos.

Gemma can also be used as a foundation for building agentic AI. As of the Gemma 3 release, the models support function calling, which is critical to agentic AI workflows.

Gemma is available through popular developers' tools, including Colab and Kaggle notebooks and frameworks such as Hugging Face Transformers, JAX, Keras 3.0 and PyTorch.

Gemma models can be deployed on Google Cloud's Vertex AI machine learning platform and Google Kubernetes Engine (GKE). Google Vertex AI lets application builders optimize Gemma for specific use cases, such as text generation summarization and Q&A. Running Gemma on GKE enables developers to build their own fine-tuned models in portable containers.

Gemma is optimized to run across popular AI hardware, including Nvidia GPUs and Google Cloud TPUs. Nvidia collaborated with Google to support Gemma through the Nvidia TensorRT-LLM open source library for optimizing LLM inference and Nvidia GPUs running in the data center, in the cloud and locally on workstations and PCs.

Gemma has been pretrained on large data sets. This saves developers the cost and time of building data sets from scratch and gives them a foundation that they can customize to build their applications. Pretrained models can help build AI apps in areas such as natural language processing (NLP), speech AI, computer vision, healthcare, cybersecurity and creative arts.

Google said Gemma was trained on a diverse set of English-language web text documents to expose it to a range of linguistic styles, topics and vocabulary. Google also trained Gemma in programming language code and mathematical text to help it generate code and answer code-related and mathematical questions.

Who can use Gemma?

Although Gemma can be used by anyone, it is designed mainly for developers. Because it is open sourced, lightweight and widely available through developer platforms and hardware devices, Gemma is said to "democratize AI."

However, there are risks to making open AI models for commercial use. Bad actors can use AI to develop applications that infringe on privacy or spread disinformation or toxic content.

Google has taken steps to address those dangers with Gemma. It released a Responsible Generative AI Toolkit for Gemma with best practices for using open AI responsibly. The toolkit provides guidance for setting safety policies for tuning, classifying, and evaluating models and a Learning Interpretability Tool to help developers understand natural language processing (NLP) model behavior. It also includes a methodology for building robust safety classifiers.

When launching Gemma, Google said it was built "to assist developers and researchers in building AI responsibly." Gemma's terms of use prohibit offensive, illegal or unethical applications.

Google also claims Gemma is pretrained by DeepMind to omit harmful, illegal and biased content, as well as personal and sensitive information. It also released its model documentation detailing its capabilities, limitations and biases.

Developers and researchers have free access to Gemma in Kaggle and Colab, an as-a-service Jupyter Notebook version. First-time Google Cloud users can receive $300 in credits when using Gemma, and researchers can apply for up to $500,000 in Google Cloud credits for their Gemma projects.

Recent updates to Gemma

Gemma has had multiple iterations since its initial debut in 2024.

Among the updates are:

Gemma 1.1

On April 5, 2024, Google released Gemma 1.1, which introduced performance improvements and bug fixes.

CodeGemma and RecurrentGemma

On April 9, 2024, Google announced the addition of two pretrained variants to the Gemma family of products: one for coding and one designed for inference and research purposes.

CodeGemma offers code completion and generation tasks, along with instruction-following capabilities. Google cited a number of advantages to using this model, including the following:

  • Its ability to generate code, even large sections, locally or when using cloud resources.
  • Enhanced accuracy related to being "trained on 500 billion tokens of primarily English-language data."
  • Its multilanguage proficiency, as CodeGemma understands and can work with a number of programming languages, including Python, JavaScript, Java, Kotlin and C++ among others.

RecurrentGemma uses recurrent neural networks and local attention to optimize memory usage. Google said that it has lower memory requirements than other models. This means it can generate longer samples on devices with limited memory, such as single GPUs or CPUs.

Google also highlighted the model's ability to handle higher batch sizes, resulting in faster generation, and touted its non-transformer architecture as a breakthrough in deep learning research.

Both CodeGemma and RecurrentGemma are built with JAX and are compatible with JAX, PyTorch, Hugging Face Transformers and Gemma.cpp.

CodeGemma is also compatible with Keras, Nvidia NeMo, TensorRT-LLM, Optimum-Nvidia, MediaPipe and available on Vertex AI. RecurrentGemma will add support for these products soon.

PaliGemma

On May 14, 2024, Google released the initial version of PaliGemma, a lightweight vision language model (VLM) based on open components such as the SigLIP vision model and Gemma language model. It was inspired by Pali-3 and is best used to add captions for images and short videos, visual question and answering, understanding image text, detecting objects and object segmentation.

PaliGemma is available on GitHub, Hugging Face models, Kaggle, Vertex AI Model Garden and Ai.nvidia.com accelerated with TensorRT-LLM. Integration is available through JAX and Hugging Face Transformers.

Gemma 2

Gemma 2 debuted with 9B and 27B variants on June 27, 2024. A 2B parameter version was released on July 31, 2024. The expansion to 27B provided more power for the model, though Google claimed it was still faster than a smaller model.

With Gemma 2, Google introduces a series of architectural improvements, including new techniques, including Grouped-Query Attention (GQA), which improves the efficiency of content processing.

ShieldGemma

On July 31, 2024, Google debuted ShieldGemma.

ShieldGemma is an instruction-tuned model for safety evaluations of text and images. It can act as a content moderation tool, applicable to both user inputs and model outputs. ShieldGemma is part of Google's Responsible Generative AI Toolkit.

Gemma 3

Gemma 3 was announced on March 10, 2025, in 1B, 4B, 12B and 27B sizes.

With Gemma 3, Google expanded the context window to 128,000 tokens, which provides more than 50% more capacity for processing content than Gemma 2, which had a context window of 80,000 tokens.

Multilingual support also gets a boost, with Google claiming that the model has been pretrained to support over 140 different languages. Multimodal reasoning is also part of Gemma 3, enabling users to analyze and reason over text, images and short video content.

Gemma 3 also marks the first version of Gemma optimized for agentic AI workflow. The model now enables function calling and structured output, enabling developers to build automated workflows.

This was last updated in March 2025

Continue Reading About What is Gemma? Google's open sourced AI model explained

Dig Deeper on AI technologies