What is Gemma? Google's open sourced AI model explained What is AI red teaming?
X
Definition

What is chain-of-thought prompting (CoT)? Examples and benefits

Chain-of-thought (CoT) prompting is a prompt engineering technique that aims to improve language models' performance on tasks requiring logic, calculation and decision-making by structuring the input prompt in a way that mimics human reasoning.

To construct a CoT prompt, a user typically appends an instruction such as "Describe your reasoning in steps" or "Explain your answer step by step" to the end of their query to a large language model (LLM). In essence, this prompting technique asks the LLM to not only generate a result, but also detail the series of intermediate steps that led to that answer.

Guiding the model to articulate these intermediate steps has shown promising results. "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models" is a seminal paper by the Google Brain -- now DeepMind -- research team, presented at the 2022 NeurIPS conference. The researchers found that CoT prompting outperformed standard prompting techniques on a range of arithmetic, commonsense and symbolic reasoning benchmarks.

How does CoT prompting work?

CoT prompting takes advantage of LLMs' capabilities, such as a sophisticated ability to generate fluent language. It does this to simulate techniques from human cognitive processing, such as planning and sequential reasoning.

When people are confronted with a challenging problem, they often break it down into smaller, more manageable pieces. For example, solving a complex math equation typically involves several substeps, each of which is essential to arriving at the final correct answer. CoT prompting asks an LLM to mimic this process of decomposing a problem and working through it step by step -- essentially, asking the model to "think out loud," rather than simply providing a solution.

The screenshot below shows an example of how chain-of-thought prompting works. The user presents OpenAI's ChatGPT with a classic river-crossing logic puzzle, adding the phrase "Describe your reasoning step by step" at the end of the prompt. When the chatbot responds, it sequentially works through the problem, describing each crossing leading up to the final answer.

Screenshot of a GPT-4 response to a chain-of-thought prompt.
GPT-4 provides a step-by-step solution to a logic puzzle in response to a chain-of-thought prompt.

The following are other examples of CoT prompts:

  • "John has one pizza, cut into eight equal slices. John eats three slices, and his friend eats two slices. How many slices are left? Explain your reasoning step by step."
  • "Alice left a glass of water outside overnight when the temperature was below freezing. The next morning, she found the glass cracked. Explain step by step why the glass cracked."
  • "If all roses are flowers, and some flowers fade quickly, can we conclude that some roses fade quickly? Explain your reasoning in steps."
  • "A classroom has two blue chairs for every three red chairs. If there are a total of 30 chairs in the classroom, how many blue chairs are there? Describe your reasoning step by step."

Different approaches to CoT prompting

CoT prompting has multiple variants, each of which uses a different approach to getting LLMs to explain their outputs:

  • Auto-CoT. In automatic CoT, the user crafts a few examples of inputs and desired outputs for an LLM to learn, including the intermediate steps taken to achieve those outputs. The LLM then learns from these examples and automatically applies the same reasoning to future interactions with the user.
  • Multimodal CoT. LLMs that are capable of processing inputs besides text -- such as audio, image and video -- are multimodal AI. An example of multimodal CoT would be asking an LLM to examine images when explaining and justifying outputs.
  • Zero-shot CoT. With this approach, the user doesn't provide an LLM with any examples for it to reference, instead asking it to "show its work" and explain how it achieved its output. This process is efficient, but not as effective for complex inputs; zero-shot chain of thought is best suited for simpler problems.
  • Least-to-most CoT. With this approach, a user breaks a large problem into smaller subproblems and sends each one to the LLM sequentially. The LLM can then solve each subsequent subproblem more easily using the answers to previous subproblems for reference.

Few-shot prompting and traditional or standard prompting are similar to CoT, but aren't considered CoT. Standard prompting doesn't require LLMs to provide complex reasoning and justify their outputs; producing an output is all that matters in the standard approach. Few-shot prompting means a user provides examples of desired outputs to similar inputs -- such as answers to similar math problems -- to help guide the LLM. However, it doesn't classify as a CoT approach.

Advantages of CoT prompting

CoT prompting offers several advantages:

  • Better responses. LLMs can only take in a limited amount of information at one time. Breaking down complex problems into simpler subtasks helps mitigate this issue. It lets LLMs process those smaller components individually, leading to more accurate and precise model responses.
  • Expanded knowledge base. CoT prompting takes advantage of LLMs' extensive pool of general knowledge. LLMs are exposed to a wide array of explanations, definitions and problem-solving examples during their training on vast textual data sets, encompassing books, articles and much of the open internet. CoT prompts tap into this reservoir of stored knowledge by triggering the model to call on and apply relevant information.
  • Logical reasoning. The technique directly targets a common limitation of LLMs: difficulty with logical reasoning. Although LLMs excel at generating coherent, relevant text, they weren't primarily designed to provide information or solve problems. Consequently, they often struggle with complex reasoning tasks and logic, especially for more complex problems. CoT prompting addresses this issue by guiding the model to take a structured reasoning approach. It directs the model to construct a logical pathway from the original prompt or problem statement to the final answer, reducing the likelihood of logical missteps and oversights.
  • Debugging. CoT prompting assists with model debugging and improvement by providing transparency in the process by which a model arrives at its answer. Because the prompts ask the model to explicitly delineate a reasoning process, they give model testers and developers better insight into how the model reached a particular conclusion. This, in turn, makes it easier to identify and correct errors when refining the model.
  • Fine-tuning. Developers can combine CoT prompting with fine-tuning to enhance LLM reasoning capabilities. For example, fine-tuning a model on a training data set containing curated examples of step-by-step reasoning and logical deduction can improve the effectiveness of CoT prompting.

Limitations of CoT prompting

Importantly, as the Google research team highlighted in its paper, the semblance of reasoning that CoT prompts elicit from LLMs doesn't mean the model is thinking. It's essential to remember that the model is a deep learning neural network trained to predict text sequences based on probability. There's no evidence to suggest that LLMs are capable of reasoning as people do. This distinction is crucial for users to understand the limitations of LLMs and maintain realistic expectations about their capabilities.

LLMs lack consciousness and metacognition, and their general knowledge derives solely from their training data -- reflecting that data set's errors, gaps and biases. Although an LLM can accurately mimic the structure of logical reasoning, this doesn't mean its conclusions are always accurate. CoT prompts serve as a valuable organizing mechanism for LLM output, but an LLM could nevertheless present a coherent, well-structured output that contains logical errors and oversights.

Techniques such as retrieval-augmented generation show promise for mitigating this limitation. RAG lets an LLM access an external source -- such as a vetted database or the internet -- in real time when asked to deliver information. In this way, RAG eliminates the need for the LLM to rely solely on the internal knowledge base gleaned from its training data, which might be flawed or incomplete.

However, while RAG can improve the accuracy and timeliness of an LLM's outputs, it doesn't inherently address the problem of logical reasoning. Deduction and reasoning require more than just factual recall; they also involve the ability to derive conclusions through logic and analysis. These are aspects of AI performance that are more closely related to the algorithmic architecture and training of the LLM itself.

Also, the scalability of CoT prompting remains in question. Although the underlying principle of sequential, multistep reasoning is applicable to AI and machine learning, CoT prompting is limited to LLMs because of their sophisticated performance on language tasks.

LLMs' large size requires significant data, compute and infrastructure, which raises issues around accessibility, efficiency and sustainability. In response to this problem, AI researchers have developed small language models, which -- while less powerful than LLMs -- perform competitively on various language tasks and require fewer computational resources. However, it remains to be seen whether the benefits of CoT prompting are transferable to smaller models, as reducing their capabilities risks compromising their problem-solving effectiveness.

It's important to keep in mind that CoT prompting is a technique for using an existing model more effectively, not a training method. While these prompts can help users elicit better results from pretrained LLMs, prompt engineering isn't a cure-all and can't fix model limitations that should have been handled during the training stage.

CoT prompting vs. prompt chaining

Chain-of-thought prompting and prompt chaining sound similar and are both prompt engineering techniques, but they differ in some important ways.

CoT prompting asks the model to describe the intermediate reasoning steps used to reason its way to a final answer within one response. This is useful for complex tasks that require detailed explanation, planning and reasoning, such as math problems and logic puzzles, where explaining the thought process is essential to fully understanding the solution.

In contrast, prompt chaining involves an iterative sequence of prompts and responses, in which each subsequent prompt is formulated based on the model's output in response to the previous one. This makes prompt chaining a useful technique for more creative, exploratory tasks that involve gradual refinement, such as generating detailed narratives and brainstorming ideas.

The fundamental difference between CoT prompting and prompt chaining lies in iteration and interactivity. CoT prompting presents the reasoning process within a single detailed, self-contained response. Prompt chaining takes a more dynamic approach, with multiple rounds of interaction that enable users to develop an idea over time.

Use cases of CoT prompting

CoT is more than just an AI technique for LLM users and tech enthusiasts. There are real-world uses for CoT that help organizations perform tasks such as the following:

  • Understanding regulations. Legal experts can use chain-of-thought prompting to direct an LLM to explain new or existing regulations -- such as laws surrounding data privacy -- and how those apply to their organization. This approach can also apply to writing new internal policies.
  • Educating new employees. An LLM can teach an organization's new hires about its internal policies. For example, a new hire can use CoT prompting to ask an LLM which policies would apply to a specific circumstance and why.
  • Answering customer queries. AI-powered chatbots are commonly used in industries for customer interactions. In this case, a customer might need to complete a complex troubleshooting process, and a chatbot can explain how and why the customer would need to perform certain actions.
  • Managing logistics and supply chains. A logistics or transportation company could rely on this technique when asking an LLM to craft a better logistics strategy. The LLM would have to explain its answers and how they optimize logistics operations.
  • Creating original content. Generative AI tools could draft and organize text in a way that's easy for readers to understand, and it could explain why it did so. Long-form content, such as complex scientific research papers, could benefit from this approach.

CoT prompting is one of multiple advanced strategies involved in prompt engineering. Learn other important strategies and tips.

This was last updated in January 2025

Continue Reading About What is chain-of-thought prompting (CoT)? Examples and benefits

Dig Deeper on AI technologies