Tech Accelerator What is GenAI? Generative AI explained

Prev Next

Definition

What is retrieval-augmented generation (RAG) in AI?

Alexander S. Gillis

By

Alexander S. Gillis, Technical Writer and Editor

Published: Dec 30, 2024

Retrieval-augmented generation (RAG) is an AI framework that retrieves data from external sources of knowledge to improve the quality of responses. This natural language processing (NLP) technique is commonly used to make large language models (LLMs) more accurate and up to date.

LLMs are AI models that power chatbots like OpenAI's ChatGPT and Google Gemini. LLMs can understand, summarize, generate and predict new content. However, they can still be inconsistent and fail at some knowledge-intensive tasks -- especially tasks that are outside their initial training data or those that require up-to-date information and transparency about how they make their decisions. When this happens, the LLM can return false information, also known as an AI hallucination.

When an LLM's trained data isn't enough, the quality of its responses can be improved by retrieving information from external sources. Retrieving information from an online source, for example, enables the LLM to access current information that it wasn't initially trained on. This process has become important for foundation AI models, chatbots and Q&A systems, as they need to respond to user queries with specific, up-to-date and accurate information.

What does RAG do and why is it important?

LLMs are a key component of modern AI systems, as they help enable AI to understand and generate human language. However, LLMs have several constraints and knowledge gaps. They're commonly trained offline, making the model unaware of any data that's created after it was trained. RAG retrieves data from outside the LLM, which then augments the LLM's response by adding relevant retrieved data to the generative response.

This article is part of

What is GenAI? Generative AI explained

Which also includes:
9 top generative AI tool categories for 2026
Will AI replace jobs? 18 job types that might be affected
30 of the best large language models in 2026

This process helps reduce any apparent knowledge gaps and AI hallucinations. This is important in fields that require as much current and accurate information as possible, such as healthcare and customer support.

How to use RAG with LLMs

RAG combines a text generator model with an information retrieval component. This retrieval component searches for external knowledge, which is data gathered from anywhere outside the LLM's original training data. Information can be retrieved from several places, such as online sources, application programming interfaces, databases and document repositories.

Using the example of a chatbot, once a user inputs a prompt, RAG summarizes that prompt using vector embeddings -- which are commonly managed in vector databases -- keywords or semantic data. The converted data is sent to a search platform to retrieve the requested data, which is then sorted based on relevance.

The LLM then synthesizes the retrieved data with the augmented prompt and its internal training data to create a generated response that can be passed to the chatbot. Depending on the chatbot, sourced links can also be provided to the user.

Diagram showing how an LLM using RAG works. — An LLM using RAG can pull from both internal and external data to return a response for users, ensuring it provides relevant information.

What are the benefits of RAG?

RAG offers the following benefits:

Provides current information. RAG pulls information from relevant, reliable and up-to-date sources.
Increases user trust. Depending on the AI implementation, users can access the model's sources, which promotes transparency and trust in the content and lets users verify its accuracy.
Reduces AI hallucinations. Because LLMs are grounded to external data, the model has less chance to make up or return incorrect information.
Reduces computational and financial costs. Organizations don't have to spend time and resources to continuously train the model on new data.
Synthesizes information. RAG synthesizes data by combining relevant information from retrieval and generative models to produce a response.
Easier to train. Because RAG uses retrieved knowledge sources, the need to train the LLM on a massive amount of data is reduced.
Can be used for multiple tasks. Aside from chatbots, RAG can be fine-tuned for various specific use cases, such as text summarization and dialogue systems.

What are the limitations of RAG?

While RAG has numerous benefits, it also comes with its own challenges and limitations, including the following:

Accuracy and data quality. Because RAG pulls from external sources, the accuracy of its response is only as good as the data it pulls from. RAG itself can't determine the accuracy of the data it gathers. This means that the accuracy of the retrieved data depends on the quality and reliability of its data sources.
Computational cost. RAG requires a model and retrieval component that can integrate retrieved data efficiently, which is a resource-intensive process.
Explainability. Some systems might not be designed to let users know where data was sourced from, which can affect user trust.
Latency. Adding a retrieval step to an LLM can add to its latency. This is especially true if the retrieval mechanism must search through larger knowledge bases.

Retrieval augmented generation vs. semantic search

Semantic search is a data searching technique that focuses on understanding the intent and contextual meanings behind search queries. It does this by applying NLP and machine learning algorithms to various factors, such as the terms used in a query, previous searches and geographic location. This is a more effective method than keyword-based searches, which attempt to match exact words or phrases in a query. Semantic search is widely used in web search engines, content management systems, chatbots and e-commerce platforms.

While RAG attempts to improve the quality of responses from an LLM by using externally sourced data, semantic search instead focuses on improving search accuracy by understanding the search query and the intent behind it.

Both can serve complementary purposes. Semantic search can improve the quality of RAG-based queries, as it focuses on gaining a deeper understanding of searches. This leads to RAG systems producing more accurate and meaningful outputs.

On its own, RAG is ideal for applications that need the most up-to-date information, while semantic search is ideal for applications where understanding user intent to improve search accuracy is paramount.

History of RAG

In the early 1970s, researchers started experimenting with and creating question-answering systems that could access text on specific topics. The process, which is known as text mining, analyzes large amounts of unstructured text and is aided by software that can identify data attributes such as concepts, patterns, topics and keywords. In the 1990s, Ask Jeeves -- a precursor to Google, now called Ask.com -- popularized question-answering systems.

Google's paper titled "Attention Is All You Need," published in 2017, introduced transformer architecture, marking a turning point in the ability to create and train both scalable and efficient LLMs. The following year, OpenAI's GPT was released, with GPT standing for Generative Pre-trained Transformer.

It wasn't until 2020 that RAG as a framework was introduced. A team with Facebook, now Meta, working at a London AI lab developed a way to condense more knowledge into the parameters of an LLM. They integrated a retrieval system with an LLM to enable the model to create more dynamic and grounded responses.

RAG continues to grow and evolve in its use. It's implemented in many major AI chatbots, including ChatGPT, for example.

Learn more about generative AI models, such as VAEs, GANs, diffusion, transformers and NeRFs.

Continue Reading About What is retrieval-augmented generation (RAG) in AI?

Generative AI challenges that businesses should consider

Generative AI ethics: Biggest concerns and risks

Assessing different types of generative AI applications

Pros and cons of AI-generated content

Cohesity Turing aims AI tools at backup and ransomware

Dig Deeper on AI technologies

Search Business Analytics

Why ethical use of data is so important to enterprises
Enterprises that don't use data ethically have a lot to lose. To maintain their businesses' trustworthiness and value, executives...
Domo adds App Catalyst to platform to aid AI development
By combining natural language code generation with enterprise-grade security and governance, the vendor aims to help customers ...
The future of business intelligence: 10 top trends in 2026
Here are 10 key trends affecting the current state and future direction of BI initiatives that analytics leaders should be aware ...

Search CIO

Inside a CIO's mind: Mastering time and knowing the business
CIO Sean McCormack explains how he balances strategy, vendors and frontline engagement -- and why his to-do list lives on his ...
CIOs are feeling the pressure of the AI leadership gap
In this Q&A, Wendy Lynch, founder of Analytic Translator, discusses how CIOs need to close a leadership gap to overcome the huge ...
Why companies should be sustainable and how IT can help
Pressure is mounting for the business sector to address its environmental footprint and become more sustainable. Here's a look at...

Search Data Management

Databricks launches PostgreSQL Lakebase to aid AI developers
Resulting from the $1B acquisition of Neon, the database built for AI workloads -- including separate compute and storage -- is ...
Pentaho update aids data integration, semantic modeling
The vendor's latest platform update aims to speed, simplify and better govern workloads to help customers build a trusted ...
Snowflake launches new AI tools, unveils OpenAI partnership
New features such as an agent-powered code generator and automated semantic modeling simplify developing cutting-edge ...

Search ERP

Who's really governing enterprise systems: IT or leaders?
Across ERP, HR software and mobile platforms, governance decisions are being set earlier, often before organizations realize ...
C-suite should make AI data management the 2026 ERP priority
Aligning data lakehouses with those of ERP vendors and data partners is important, but it won't be enough without silo-busting ...
8 ERP security best practices for modern ERP environments
As supply chain attacks continue, ERP security requires strong authentication, regular patching, monitoring and incident response...

Close