Chapter 2 – The Agent Loop

From Single Calls to Iterative Execution

The defining characteristic of an agent — what separates it from a simple LLM call — is the loop. An agent doesn’t just generate a response; it enters a cycle of reasoning, acting, and observing that continues until the task is complete or a stopping condition is met.

Anatomy of the Agent Loop

Every agent loop follows the same basic structure:

                    ┌─────────────┐
                    │   START     │
                    │  (User Goal)│
                    └──────┬──────┘
                           │
                    ┌──────▼──────┐
               ┌───►│   REASON    │
               │    │  (Think,    │
               │    │   Plan)     │
               │    └──────┬──────┘
               │           │
               │    ┌──────▼──────┐
               │    │    ACT      │
               │    │  (Call tool,│
               │    │   generate) │
               │    └──────┬──────┘
               │           │
               │    ┌──────▼──────┐
               │    │  OBSERVE    │
               │    │ (Read result│
               │    │  check goal)│
               │    └──────┬──────┘
               │           │
               │     ┌─────▼─────┐
               │     │  Done?    │──── Yes ───► RETURN RESULT
               │     └─────┬─────┘
               │           │ No
               └───────────┘

The four phases of every iteration:

Reason — The LLM analyzes the current state, considers what has been done so far, and decides what to do next
Act — The LLM either calls a tool, generates output, or requests information
Observe — The system captures the result of the action and feeds it back to the LLM
Evaluate — The system (or the LLM) determines whether the goal has been achieved or another iteration is needed

The ReAct Pattern

The ReAct (Reasoning + Acting) pattern, introduced by Yao et al. (2022), formalized the interleaving of reasoning and action steps. In ReAct, the LLM explicitly generates:

Thought: What it’s thinking and planning
Action: What tool to call or what action to take
Observation: The result of the action (provided by the environment)

This cycle repeats until the model produces a final answer.

# See code/agent_loop.py for the full implementation

def agent_loop(goal, tools, llm, max_iterations=10):
    """A minimal agent loop implementing the ReAct pattern."""
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": goal}
    ]
    
    for i in range(max_iterations):
        response = llm.generate(messages, tools=tools)
        
        if response.is_final_answer:
            return response.content
        
        if response.tool_calls:
            for tool_call in response.tool_calls:
                result = execute_tool(tool_call, tools)
                messages.append({"role": "tool", "content": result})
        
        messages.append({"role": "assistant", "content": response.content})
    
    return "Max iterations reached."

Stopping Conditions

An agent loop without proper stopping conditions is dangerous — it can run forever, consuming tokens and money. Every production agent needs clear termination criteria:

Natural Completion

The LLM determines the task is done and returns a final answer. This is the ideal case.

Maximum Iterations

A hard cap on the number of loop iterations. This is the most basic safety net.

Maximum Tokens / Cost

A budget constraint that stops execution when token usage or cost exceeds a threshold.

Timeout

A wall-clock time limit for the entire agent execution.

Human Checkpoint

The agent pauses at predefined points to request human approval before continuing.

# See code/agent_loop.py for the full implementation

class StoppingCondition:
    def __init__(self, max_iterations=20, max_tokens=50000, 
                 timeout_seconds=300):
        self.max_iterations = max_iterations
        self.max_tokens = max_tokens
        self.timeout = timeout_seconds
        self.start_time = time.time()
        self.total_tokens = 0
    
    def should_stop(self, iteration, tokens_used):
        self.total_tokens += tokens_used
        if iteration >= self.max_iterations:
            return True, "max_iterations"
        if self.total_tokens >= self.max_tokens:
            return True, "max_tokens"
        if time.time() - self.start_time > self.timeout:
            return True, "timeout"
        return False, None

Ground Truth and Environment Feedback

A critical insight from Anthropic’s agent guide: during execution, agents must gain “ground truth” from the environment at each step to assess progress. This means:

A coding agent should run tests after writing code, not just assume correctness
A research agent should verify claims against search results, not hallucinate sources
A data agent should execute queries and inspect actual results, not guess at outputs

The quality of the feedback loop — how much real-world signal the agent gets at each step — is often the single biggest factor in agent effectiveness.

Structured vs. Unstructured Actions

Agents can express their actions in different formats:

JSON Tool Calls

The most common format. The LLM generates structured JSON specifying the tool name and parameters. Most LLM APIs support this natively via function/tool calling.

{
  "tool": "web_search",
  "arguments": {
    "query": "latest Python 3.13 features"
  }
}

Code Actions (CodeAct)

Research from Wang et al. (2024) showed that using executable Python code as the action format outperforms JSON — with up to 20% higher success rates. The agent writes Python code that gets executed in a sandboxed interpreter.

# The agent generates this as its "action"
results = web_search("latest Python 3.13 features")
summary = "\n".join([r.title for r in results[:5]])
print(summary)

The CodeAct approach is more flexible because the agent can compose tools, use variables, write loops, and handle errors — all within a single action step.

Natural Language Actions

In some multi-agent systems, agents communicate through natural language messages rather than structured tool calls. One agent simply writes a message to another, and the receiving agent interprets it.

Error Handling and Recovery

Robust agents must handle failures gracefully:

Tool failures: The API is down, rate-limited, or returns an error
Invalid actions: The LLM generates a malformed tool call
Reasoning errors: The agent goes down a wrong path and needs to backtrack
Context overflow: The conversation grows too long for the context window

Good error handling strategies include:

Retry with backoff — Try the same action again after a delay
Alternative tools — Switch to a different tool that can accomplish the same goal
Replanning — Ask the LLM to reconsider its approach given the error
Graceful degradation — Return partial results with an explanation of what failed

# See code/agent_loop.py for the full implementation

def execute_tool_with_recovery(tool_call, tools, llm, max_retries=3):
    for attempt in range(max_retries):
        try:
            result = execute_tool(tool_call, tools)
            return result
        except ToolError as e:
            if attempt == max_retries - 1:
                return f"Tool '{tool_call.name}' failed after {max_retries} attempts: {e}"
            # Ask the LLM to suggest an alternative approach
            recovery = llm.generate(
                f"Tool call failed with error: {e}. "
                f"Suggest an alternative approach."
            )
            if recovery.suggests_alternative_tool:
                tool_call = recovery.alternative_tool_call

The Agent Loop in Practice

Here’s a complete example showing how a simple research agent loop works in practice:

User: “What are the key differences between Python 3.12 and 3.13?”
Agent thinks: I need to search for Python 3.13 release notes
Agent acts: Calls web_search("Python 3.13 new features changelog")
Observation: Returns search results with links to PEPs and docs
Agent thinks: I have some results, but I should also check 3.12 features for comparison
Agent acts: Calls web_search("Python 3.12 major features")
Observation: Returns search results about 3.12
Agent thinks: I now have enough information to write a comparison
Agent acts: Generates a structured comparison document
Done: Returns the final answer

Each step is a turn through the agent loop. The total number of turns was not predetermined — the agent decided dynamically how many iterations it needed.

Navigation: