Skip to the content.

Summary of Prompt Repetition Improves Non-Reasoning LLMs

This document was created by John MacCormick and dictated into ChatGPT (GPT-5.3). The model reformatted and lightly edited the content for clarity, correcting minor errors and improving readability while preserving the original meaning.


This is a summary of Prompt Repetition Improves Non-Reasoning LLMs by Leviathan, Kelman and Matias (Google Research, December 2025).

The core idea of the paper is that simply repeating the prompt sent to a large language model (LLM) can improve performance when explicit reasoning is disabled. This effect is observed across a range of problem types and is observed in all seven LLMs evaluated in the study.


Variation in Performance Gains

The magnitude of improvement varies significantly depending on the task:


Example Benchmark: OpenBookQA

One illustrative example is the OpenBookQA benchmark, introduced by Mihai Glăveanu et al. (2018) in the paper “Can a Suit of Armor Conduct Electricity?”.

This benchmark consists of multiple-choice questions that combine:

Human performance on this benchmark is typically around 92% accuracy.

Example Questions


Experimental Findings on OpenBookQA

The authors conducted two experiments using the OpenBookQA dataset:

1. Reordered Prompts (Options First)

2. Standard Ordering (Question First)


Custom Task: “NameIndex”

The authors also designed custom tasks to highlight cases where prompt repetition has especially strong effects. One such task is called NameIndex.

Task Description

A typical prompt takes the form:

“Here is a list of names…”

This is followed by a list (e.g., 50 full names, each with a given name and family name), and then a question such as:

“What is the 25th name?”

Results

Substantial improvements were observed across most models when the prompt was repeated:


Effect of Reasoning

When reasoning capabilities were enabled, prompt repetition had little to no effect on most models and tasks.

The authors hypothesize that this is because reasoning-enabled models often implicitly repeat or restate the prompt as part of their internal or external reasoning process. As a result, explicitly repeating the prompt provides little additional benefit.