Chapter 3 – Reflection Pattern

The LLM as Its Own Critic

Reflection is perhaps the most accessible and immediately impactful agentic design pattern. The core idea is simple: ask the LLM to examine its own output, identify weaknesses, and improve it.

This mirrors how humans work. A skilled writer doesn’t submit their first draft. They write, re-read, critique, and revise. Reflection gives LLMs this same capability, and the results can be dramatic — turning mediocre first-draft outputs into polished, high-quality results.

How Reflection Works

The simplest form of reflection is a two-step process:

  1. Generate: The LLM produces an initial output
  2. Reflect: The LLM (or a second LLM call) critiques the output and suggests improvements
  3. Revise: The LLM rewrites the output incorporating the feedback
  4. Repeat steps 2–3 until the output is satisfactory
    ┌──────────┐     ┌──────────────┐     ┌──────────┐
    │ Generate │────►│   Reflect    │────►│  Revise  │
    │ (Draft)  │     │  (Critique)  │     │(Improved)│
    └──────────┘     └──────────────┘     └─────┬────┘
                           ▲                     │
                           │                     │
                           └─────────────────────┘
                              (repeat N times)

Self-Reflection: The Basic Pattern

In self-reflection, a single LLM is used for both generation and critique. The key is crafting the critique prompt to be specific and constructive.

# See code/reflection.py for the full implementation

def self_reflect(llm, task, max_rounds=3):
    """Generate output and iteratively improve it through self-reflection."""
    
    # Step 1: Initial generation
    draft = llm.generate(f"Perform the following task:\n{task}")
    
    for round in range(max_rounds):
        # Step 2: Self-critique
        critique = llm.generate(
            f"You produced the following output for the task '{task}':\n\n"
            f"{draft}\n\n"
            f"Critically evaluate this output. Identify:\n"
            f"1. Errors or inaccuracies\n"
            f"2. Missing information\n"
            f"3. Areas where clarity could be improved\n"
            f"4. Logical gaps or unsupported claims\n"
            f"Be specific and constructive."
        )
        
        # Step 3: Revise based on critique
        draft = llm.generate(
            f"Original task: {task}\n\n"
            f"Your previous output:\n{draft}\n\n"
            f"Critique of your output:\n{critique}\n\n"
            f"Revise your output to address the critique. "
            f"Produce an improved version."
        )
    
    return draft

Two-Agent Reflection

A more effective variant uses two distinct agents — a generator and a critic — with different system prompts optimized for their roles.

# See code/reflection.py for the full implementation

def two_agent_reflect(generator_llm, critic_llm, task, max_rounds=3):
    """Use separate generator and critic agents for reflection."""
    
    generator_prompt = (
        "You are an expert writer. Produce clear, accurate, "
        "well-structured output."
    )
    
    critic_prompt = (
        "You are a demanding but constructive editor. "
        "Your job is to find flaws, gaps, and areas for improvement. "
        "Be specific. Point to exact phrases or sections that need work."
    )
    
    draft = generator_llm.generate(task, system=generator_prompt)
    
    for round in range(max_rounds):
        critique = critic_llm.generate(
            f"Review this output for the task '{task}':\n\n{draft}",
            system=critic_prompt
        )
        
        if "no major issues" in critique.lower():
            break  # Critic is satisfied
        
        draft = generator_llm.generate(
            f"Task: {task}\nYour draft:\n{draft}\n"
            f"Editor feedback:\n{critique}\n"
            f"Revise accordingly.",
            system=generator_prompt
        )
    
    return draft

Tool-Augmented Reflection

Reflection becomes even more powerful when the critic has access to tools that provide ground truth. Instead of relying on the LLM’s judgment alone, the system can verify outputs against real-world data.

Code Reflection with Test Execution

For code generation, the most effective reflection strategy is to run the generated code against test cases:

# See code/reflection.py for the full implementation

def code_reflection(llm, task, test_cases, max_rounds=5):
    """Generate code and refine it using test execution feedback."""
    
    code = llm.generate(f"Write Python code to: {task}")
    
    for round in range(max_rounds):
        # Run the code against test cases
        results = run_tests(code, test_cases)
        
        if results.all_passed:
            return code  # All tests pass — we're done
        
        # Reflect on failures
        code = llm.generate(
            f"Your code for '{task}':\n```python\n{code}\n```\n\n"
            f"Test results:\n{results.summary}\n\n"
            f"Fix the code to pass all tests."
        )
    
    return code

For research tasks, the critic can verify claims by searching for supporting evidence:

# See code/reflection.py for the full implementation

def research_reflection(llm, topic, search_tool, max_rounds=3):
    """Generate research content and verify claims."""
    
    report = llm.generate(f"Write a research summary about: {topic}")
    
    for round in range(max_rounds):
        # Extract key claims
        claims = llm.generate(
            f"Extract the 3 most important factual claims from:\n{report}"
        )
        
        # Verify claims
        for claim in claims:
            evidence = search_tool.search(claim)
            if not evidence.supports_claim:
                report = llm.generate(
                    f"Your report:\n{report}\n\n"
                    f"This claim could not be verified: '{claim}'\n"
                    f"Search results: {evidence}\n"
                    f"Revise the report to correct or remove "
                    f"unverified claims."
                )
    
    return report

When to Use Reflection

Reflection is one of the most reliable and immediately useful patterns. Andrew Ng considers it and Tool Use the two most mature agentic patterns, noting: “Reflection is a relatively basic type of agentic workflow, but I’ve been delighted by how much it improved my applications’ results.”

Good Fits for Reflection

When Reflection May Not Help

Key Research

The reflection pattern draws from several important papers:

Practical Tips

  1. Limit reflection rounds: 2–3 rounds usually captures most of the benefit; beyond that, you get diminishing returns
  2. Make critique prompts specific: “Find errors” is less effective than “Check for logical consistency, factual accuracy, and completeness”
  3. Use different temperatures: Higher temperature for generation (creativity), lower for critique (precision)
  4. Add stopping criteria: Let the critic signal when the output is satisfactory to avoid unnecessary rounds
  5. Log everything: Track how the output changes across rounds to understand where reflection helps most

Navigation: