25 of the best large language models in 2025
Large language models have been affecting search for years and have been brought to the forefront by ChatGPT and other chatbots.
Large language models are the dynamite behind the generative AI boom. However, they've been around for a while.
LLMs are black box AI systems that use deep learning on extremely large datasets to understand and generate new text. Modern LLMs began taking shape in 2014 when the attention mechanism -- a machine learning technique designed to mimic human cognitive attention -- was introduced in a research paper titled "Neural Machine Translation by Jointly Learning to Align and Translate." In 2017, that attention mechanism was honed with the introduction of the transformer model in another paper, "Attention Is All You Need."
Some of the most well-known language models today are based on the transformer model, including the generative pre-trained transformer series of LLMs and bidirectional encoder representations from transformers (BERT).
ChatGPT, which runs on a set of language models from OpenAI, attracted more than 100 million users just two months after its release in 2022. Since then, many competing models have been released. Some belong to big companies such as Google, Amazon and Microsoft; others are open source.
Constant developments in the field can be difficult to keep track of. Here are some of the most influential models, both past and present. Included in it are models that paved the way for today's leaders as well as those that could have a significant effect in the future.
This article is part of
What is Gen AI? Generative AI explained
Top current LLMs
Below are some of the most relevant large language models today. They do natural language processing and influence the architecture of future models.
BERT
BERT is a family of LLMs that Google introduced in 2018. BERT is a transformer-based model that can convert sequences of data to other sequences of data. BERT's architecture is a stack of transformer encoders and features 342 million parameters. BERT was pre-trained on a large corpus of data then fine-tuned to perform specific tasks along with natural language inference and sentence text similarity. It was used to improve query understanding in the 2019 iteration of Google search.
Claude
The Claude LLM focuses on constitutional AI, which shapes AI outputs guided by a set of principles that help the AI assistant it powers helpful, harmless and accurate. Claude was created by the company Anthropic.
There are three primary branches of Claude -- Opus, Haiku and Sonnet. The latest iteration of the Claude LLM is the Claude 3.5 Sonnet. It understands nuance, humor and complex instructions better than earlier versions of the LLM. It also has broad programming capabilities that make it well-suited for application development. In October 2024, Claude added a computer-use AI tool, that enables the LLM to use a computer like a human does. It's available via Claude.ai, the Claude iOS app and through an API.
Cohere
Cohere is an enterprise AI platform that provides several LLMs including Command, Rerank and Embed. These LLMs can be custom-trained and fine-tuned to a specific company's use case. The company that created the Cohere LLM was founded by one of the authors of Attention Is All You Need.
DeepSeek-R1
DeepSeek-R1 is an open-source reasoning model for tasks with complex reasoning, mathematical problem-solving and logical inference. The model uses reinforcement learning techniques to refine its reasoning ability and solve complex problems. DeepSeek-R1 can perform critical problem-solving through self-verification, chain-of-thought reasoning and reflection.
Ernie
Ernie is Baidu's large language model which powers the Ernie 4.0 chatbot. The bot was released in August 2023 and has garnered more than 45 million users. Ernie is rumored to have 10 trillion parameters. The bot works best in Mandarin but is capable in other languages.
Falcon
Falcon is a family of transformer-based models developed by the Technology Innovation Institute. It is open source and has multi-lingual capabilities. Falcon 2 is available in an 11 billion parameter version that provide multimodal capabilities for both text and vision.
The Falcon 1 series includes a pair of larger models with Falcon 40B and Falcon 180B. Falcon models are available on GitHub as well as on cloud provider including Amazon.
Gemini
Gemini is Google's family of LLMs that power the company's chatbot of the same name. The model replaced Palm in powering the chatbot, which was rebranded from Bard to Gemini upon the model switch. Gemini models are multimodal, meaning they can handle images, audio and video as well as text. Gemini is also integrated in many Google applications and products. It comes in three sizes -- Ultra, Pro and Nano. Ultra is the largest and most capable model, Pro is the mid-tier model and Nano is the smallest model, designed for efficiency with on-device tasks.
Among the most recent models is the Gemini 1.5 Pro update that debuted in May 2024 Gemini is available as a web chatbot, the Google Vertex AI service and via API. Early previews of Gemini 2.0 Flash became available in December 2024 with updated multimodal generation capabilities.
Gemma
Gemma is a family of open-source language models from Google that were trained on the same resources as Gemini. Gemma 2 was released in June 2024 in two sizes -- a 9 billion parameter model and a 27 billion parameter model. Gemma models can be run locally on a personal computer, and are also available in Google Vertex AI.
GPT-3
GPT-3 is OpenAI's large language model with more than 175 billion parameters, released in 2020. GPT-3 uses a decoder-only transformer architecture. In September 2022, Microsoft announced it had exclusive use of GPT-3's underlying model. GPT-3 is 10 times larger than its predecessor. GPT-3's training data includes Common Crawl, WebText2, Books1, Books2 and Wikipedia.
GPT-3 is the last of the GPT series of models in which OpenAI made the parameter counts publicly available. The GPT series was first introduced in 2018 with OpenAI's paper "Improving Language Understanding by Generative Pre-Training."
GPT-3.5
GPT-3.5 is an upgraded version of GPT-3 with fewer parameters. GPT-3.5 was fine-tuned using reinforcement learning from human feedback. GPT-3.5 is the version of GPT that powers ChatGPT. There are several models, with GPT-3.5 turbo being the most capable, according to OpenAI. GPT-3.5's training data extends to September 2021.
It was also integrated into the Bing search engine but has since been replaced with GPT-4.
GPT-4
GPT-4 , was released in 2023 and like the others in the OpenAI GPT family, it's a transformer-based model. Unlike the others, its parameter count has not been released to the public, though there are rumors that the model has more than 170 trillion. OpenAI describes GPT-4 as a multimodal model, meaning it can process and generate both language and images as opposed to being limited to only language. GPT-4 also introduced a system message, which lets users specify tone of voice and task.
GPT-4 demonstrated human-level performance in multiple academic exams. At the model's release, some speculated that GPT-4 came close to artificial general intelligence, which means it is as smart or smarter than a human. That speculation turned out to be unfounded.
GPT-4o
GPT-4 Omni (GPT-4o) is OpenAI's successor to GPT-4 and offers several improvements over the previous model. GPT-4o creates a more natural human interaction for ChatGPT and is a large multimodal model, accepting various inputs including audio, image and text. The conversations let users engage as they would in a normal human conversation, and the real-time interactivity can also pick up on emotions. GPT-4o can see photos or screens and ask questions about them during interaction.
GPT-4o can respond in 232 milliseconds, similar to human response time and faster than GPT-4 Turbo.
Granite
The IBM Granite family of models are fully open source models under the Apache v.2 license. The first iteration of the open source model models debuted in May 2024, followed by Granite 3.0 in October and Granite 3.1 in December 2024.
There are multiple variants in the Granite model family including General-purpose models (8B and 2B variants), guardrail model and Mixture-of-Experts models. While the model can be used for general purpose deployments, IBM itself is focusing deployment and optimization for enterprise use cases like customer service, IT automation and cybersecurity.
Lamda
Lamda (Language Model for Dialogue Applications) is a family of LLMs developed by Google Brain announced in 2021. Lamda used a decoder-only transformer language model and was pre-trained on a large corpus of text. In 2022, LaMDA gained widespread attention when then-Google engineer Blake Lemoine went public with claims that the program was sentient. It was built on the Seq2Seq architecture.
Llama
Large Language Model Meta AI (Llama) is Meta's LLM which was first released in 2023. The Llama 3.1 models were released in July 2024, including both a 405 billion and 70 billion parameter model.
The most recent version is Llama 3.2 which was released in September 2024, initially with smaller parameter counts of 11 billion and 90 billion.
Llama uses a transformer architecture and was trained on a variety of public data sources, including webpages from CommonCrawl, GitHub, Wikipedia and Project Gutenberg. Llama was effectively leaked and spawned many descendants, including Vicuna and Orca. Llama is available under an open license, allowing for free use of the models. Lllama models are available in many locations including llama.com and Hugging Face.
Mistral
Mistral is a family of a mixture of expert models from Mistral AI. Among the newest models is Mistral Large 2 which was first released in July 2024. The model operates with 123 billion parameters and a 128k context window, supporting dozens of languages including French, German, Spanish, Italian, and many others, along with more than 80 coding languages.
In November 2024, Mistral released Pixtral Large, a 124-billion-parameter multimodal model that can handle text and visual data. Mistral models are available via Mistral's API on its Le Platforme-managed web service.
o1
The OpenAI o1 model family was first introduced in Sept. 2024. The o1 model's focus is to provide what OpenAI refers to as - reasoning models, that can reason through a problem or query before offering a response.
The o1 models excel in STEM fields, with strong results in mathematical reasoning (scoring 83% on the International Mathematics Olympiad compared to GPT-4o's 13%), code generation and scientific research tasks. While they offer enhanced reasoning and improved safety features, they operate more slowly than previous models due to their thorough reasoning processes and come with certain limitations, such as restricted access features and higher API costs. The models are available to ChatGPT Plus and Team users, with varying access levels for different user categories.
o3
OpenAI introduced the successor model, o3, in December 2024. According to OpenAI, o3 is designed to handle tasks with more analytical thinking, problem-solving and complex reasoning and will improve o1's capabilities and performance. The o3 model is in safety testing mode and is currently not available to the public.
Orca
Orca was developed by Microsoft and has 13 billion parameters, meaning it's small enough to run on a laptop. It aims to improve on advancements made by other open source models by imitating the reasoning procedures achieved by LLMs. Orca achieves the same performance as GPT-4 with significantly fewer parameters and is on par with GPT-3.5 for many tasks. Orca is built on top of the 13 billion parameter version of Llama.
Palm
The Pathways Language Model is a 540 billion parameter transformer-based model from Google powering its AI chatbot Bard. It was trained across multiple TPU 4 Pods -- Google's custom hardware for machine learning. Palm specializes in reasoning tasks such as coding, math, classification and question answering. Palm also excels at decomposing complex tasks into simpler subtasks.
PaLM gets its name from a Google research initiative to build Pathways, ultimately creating a single model that serves as a foundation for multiple use cases. There are several fine-tuned versions of Palm, including Med-Palm 2 for life sciences and medical information as well as Sec-Palm for cybersecurity deployments to speed up threat analysis.
Phi
Phi is a transformer-based language model from Microsoft. The Phi 3.5 models were first released in August 2024.
The series includes Phi-3.5-mini-instruct (3.82 billion parameters), Phi-3.5-MoE-instruct (41.9 billion parameters), and Phi-3.5-vision-instruct (4.15 billion parameters), each designed for specific tasks ranging from basic reasoning to vision analysis. All three models support a 128k token context length.
Released under a Microsoft-branded MIT License, they are available for developers to download, use, and modify without restrictions, including for commercial purposes.
Qwen
Qwen is large family of open models developed by Chinese internet giant Alibaba Cloud. The newest set of models are the Qwen2.5 suite, which support 29 different languages and currently scale up to 72 billion parameters. These models are suitable for a wide range of tasks, including code generation, structured data understanding, mathematical problem-solving as well as general language understanding and generation.
StableLM
StableLM is a series of open language models developed by Stability AI, the company behind image generator Stable Diffusion.
StableLM 2 debuted in January 2024 initially with a 1.6 billion parameter model. In April 2024 that was expanded to also include a 12 billion parameter model. StableLM 2 supports seven languages: English, Spanish, German, Italian, French, Portuguese, and Dutch. Stability AI positions these models as offering different options for various use cases, with the 1.6B model suitable for specific, narrow tasks and faster processing while the 12B model provides more capability but requires more computational resources.
Tülu 3
Allen Institute for AI's Tülu 3 is an open-source 405 billion-parameter LLM. The Tülu 3 405B model has post-training methods that combine supervised fine-tuning and reinforcement learning at a larger scale. Tülu 3 uses a "reinforcement learning from verifiable rewards" framework for fine-tuning tasks with verifiable outcomes -- such as solving mathematical problems and following instructions.
Vicuna 33B
Vicuna is another influential open source LLM derived from Llama. It was developed by LMSYS and was fine-tuned using data from sharegpt.com. It is smaller and less capable that GPT-4 according to several benchmarks, but does well for a model of its size. Vicuna has only 33 billion parameters, whereas GPT-4 has trillions.
LLM precursors
Although LLMs are a recent phenomenon, their precursors go back decades. Learn how recent precursor Seq2Seq and distant precursor ELIZA set the stage for modern LLMs.
Seq2Seq
Seq2Seq is a deep learning approach used for machine translation, image captioning and natural language processing. It was developed by Google and underlies some of their modern LLMs, including LaMDA. Seq2Seq also underlies AlexaTM 20B, Amazon's large language model. It uses a mix of encoders and decoders.
Eliza
Eliza was an early natural language processing program created in 1966. It is one of the earliest examples of a language model. Eliza simulated conversation using pattern matching and substitution. Eliza, running a certain script, could parody the interaction between a patient and therapist by applying weights to certain keywords and responding to the user accordingly. The creator of Eliza, Joshua Weizenbaum, wrote a book on the limits of computation and artificial intelligence.