Reflection is perhaps the most accessible and immediately impactful agentic design pattern. The core idea is simple: ask the LLM to examine its own output, identify weaknesses, and improve it.
This mirrors how humans work. A skilled writer doesn’t submit their first draft. They write, re-read, critique, and revise. Reflection gives LLMs this same capability, and the results can be dramatic — turning mediocre first-draft outputs into polished, high-quality results.
The simplest form of reflection is a two-step process:
┌──────────┐ ┌──────────────┐ ┌──────────┐
│ Generate │────►│ Reflect │────►│ Revise │
│ (Draft) │ │ (Critique) │ │(Improved)│
└──────────┘ └──────────────┘ └─────┬────┘
▲ │
│ │
└─────────────────────┘
(repeat N times)
In self-reflection, a single LLM is used for both generation and critique. The key is crafting the critique prompt to be specific and constructive.
# See code/reflection.py for the full implementation
def self_reflect(llm, task, max_rounds=3):
"""Generate output and iteratively improve it through self-reflection."""
# Step 1: Initial generation
draft = llm.generate(f"Perform the following task:\n{task}")
for round in range(max_rounds):
# Step 2: Self-critique
critique = llm.generate(
f"You produced the following output for the task '{task}':\n\n"
f"{draft}\n\n"
f"Critically evaluate this output. Identify:\n"
f"1. Errors or inaccuracies\n"
f"2. Missing information\n"
f"3. Areas where clarity could be improved\n"
f"4. Logical gaps or unsupported claims\n"
f"Be specific and constructive."
)
# Step 3: Revise based on critique
draft = llm.generate(
f"Original task: {task}\n\n"
f"Your previous output:\n{draft}\n\n"
f"Critique of your output:\n{critique}\n\n"
f"Revise your output to address the critique. "
f"Produce an improved version."
)
return draft
A more effective variant uses two distinct agents — a generator and a critic — with different system prompts optimized for their roles.
# See code/reflection.py for the full implementation
def two_agent_reflect(generator_llm, critic_llm, task, max_rounds=3):
"""Use separate generator and critic agents for reflection."""
generator_prompt = (
"You are an expert writer. Produce clear, accurate, "
"well-structured output."
)
critic_prompt = (
"You are a demanding but constructive editor. "
"Your job is to find flaws, gaps, and areas for improvement. "
"Be specific. Point to exact phrases or sections that need work."
)
draft = generator_llm.generate(task, system=generator_prompt)
for round in range(max_rounds):
critique = critic_llm.generate(
f"Review this output for the task '{task}':\n\n{draft}",
system=critic_prompt
)
if "no major issues" in critique.lower():
break # Critic is satisfied
draft = generator_llm.generate(
f"Task: {task}\nYour draft:\n{draft}\n"
f"Editor feedback:\n{critique}\n"
f"Revise accordingly.",
system=generator_prompt
)
return draft
Reflection becomes even more powerful when the critic has access to tools that provide ground truth. Instead of relying on the LLM’s judgment alone, the system can verify outputs against real-world data.
For code generation, the most effective reflection strategy is to run the generated code against test cases:
# See code/reflection.py for the full implementation
def code_reflection(llm, task, test_cases, max_rounds=5):
"""Generate code and refine it using test execution feedback."""
code = llm.generate(f"Write Python code to: {task}")
for round in range(max_rounds):
# Run the code against test cases
results = run_tests(code, test_cases)
if results.all_passed:
return code # All tests pass — we're done
# Reflect on failures
code = llm.generate(
f"Your code for '{task}':\n```python\n{code}\n```\n\n"
f"Test results:\n{results.summary}\n\n"
f"Fix the code to pass all tests."
)
return code
For research tasks, the critic can verify claims by searching for supporting evidence:
# See code/reflection.py for the full implementation
def research_reflection(llm, topic, search_tool, max_rounds=3):
"""Generate research content and verify claims."""
report = llm.generate(f"Write a research summary about: {topic}")
for round in range(max_rounds):
# Extract key claims
claims = llm.generate(
f"Extract the 3 most important factual claims from:\n{report}"
)
# Verify claims
for claim in claims:
evidence = search_tool.search(claim)
if not evidence.supports_claim:
report = llm.generate(
f"Your report:\n{report}\n\n"
f"This claim could not be verified: '{claim}'\n"
f"Search results: {evidence}\n"
f"Revise the report to correct or remove "
f"unverified claims."
)
return report
Reflection is one of the most reliable and immediately useful patterns. Andrew Ng considers it and Tool Use the two most mature agentic patterns, noting: “Reflection is a relatively basic type of agentic workflow, but I’ve been delighted by how much it improved my applications’ results.”
The reflection pattern draws from several important papers:
Navigation: