The defining characteristic of an agent — what separates it from a simple LLM call — is the loop. An agent doesn’t just generate a response; it enters a cycle of reasoning, acting, and observing that continues until the task is complete or a stopping condition is met.
Every agent loop follows the same basic structure:
┌─────────────┐
│ START │
│ (User Goal)│
└──────┬──────┘
│
┌──────▼──────┐
┌───►│ REASON │
│ │ (Think, │
│ │ Plan) │
│ └──────┬──────┘
│ │
│ ┌──────▼──────┐
│ │ ACT │
│ │ (Call tool,│
│ │ generate) │
│ └──────┬──────┘
│ │
│ ┌──────▼──────┐
│ │ OBSERVE │
│ │ (Read result│
│ │ check goal)│
│ └──────┬──────┘
│ │
│ ┌─────▼─────┐
│ │ Done? │──── Yes ───► RETURN RESULT
│ └─────┬─────┘
│ │ No
└───────────┘
The four phases of every iteration:
The ReAct (Reasoning + Acting) pattern, introduced by Yao et al. (2022), formalized the interleaving of reasoning and action steps. In ReAct, the LLM explicitly generates:
This cycle repeats until the model produces a final answer.
# See code/agent_loop.py for the full implementation
def agent_loop(goal, tools, llm, max_iterations=10):
"""A minimal agent loop implementing the ReAct pattern."""
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": goal}
]
for i in range(max_iterations):
response = llm.generate(messages, tools=tools)
if response.is_final_answer:
return response.content
if response.tool_calls:
for tool_call in response.tool_calls:
result = execute_tool(tool_call, tools)
messages.append({"role": "tool", "content": result})
messages.append({"role": "assistant", "content": response.content})
return "Max iterations reached."
An agent loop without proper stopping conditions is dangerous — it can run forever, consuming tokens and money. Every production agent needs clear termination criteria:
The LLM determines the task is done and returns a final answer. This is the ideal case.
A hard cap on the number of loop iterations. This is the most basic safety net.
A budget constraint that stops execution when token usage or cost exceeds a threshold.
A wall-clock time limit for the entire agent execution.
The agent pauses at predefined points to request human approval before continuing.
# See code/agent_loop.py for the full implementation
class StoppingCondition:
def __init__(self, max_iterations=20, max_tokens=50000,
timeout_seconds=300):
self.max_iterations = max_iterations
self.max_tokens = max_tokens
self.timeout = timeout_seconds
self.start_time = time.time()
self.total_tokens = 0
def should_stop(self, iteration, tokens_used):
self.total_tokens += tokens_used
if iteration >= self.max_iterations:
return True, "max_iterations"
if self.total_tokens >= self.max_tokens:
return True, "max_tokens"
if time.time() - self.start_time > self.timeout:
return True, "timeout"
return False, None
A critical insight from Anthropic’s agent guide: during execution, agents must gain “ground truth” from the environment at each step to assess progress. This means:
The quality of the feedback loop — how much real-world signal the agent gets at each step — is often the single biggest factor in agent effectiveness.
Agents can express their actions in different formats:
The most common format. The LLM generates structured JSON specifying the tool name and parameters. Most LLM APIs support this natively via function/tool calling.
{
"tool": "web_search",
"arguments": {
"query": "latest Python 3.13 features"
}
}
Research from Wang et al. (2024) showed that using executable Python code as the action format outperforms JSON — with up to 20% higher success rates. The agent writes Python code that gets executed in a sandboxed interpreter.
# The agent generates this as its "action"
results = web_search("latest Python 3.13 features")
summary = "\n".join([r.title for r in results[:5]])
print(summary)
The CodeAct approach is more flexible because the agent can compose tools, use variables, write loops, and handle errors — all within a single action step.
In some multi-agent systems, agents communicate through natural language messages rather than structured tool calls. One agent simply writes a message to another, and the receiving agent interprets it.
Robust agents must handle failures gracefully:
Good error handling strategies include:
# See code/agent_loop.py for the full implementation
def execute_tool_with_recovery(tool_call, tools, llm, max_retries=3):
for attempt in range(max_retries):
try:
result = execute_tool(tool_call, tools)
return result
except ToolError as e:
if attempt == max_retries - 1:
return f"Tool '{tool_call.name}' failed after {max_retries} attempts: {e}"
# Ask the LLM to suggest an alternative approach
recovery = llm.generate(
f"Tool call failed with error: {e}. "
f"Suggest an alternative approach."
)
if recovery.suggests_alternative_tool:
tool_call = recovery.alternative_tool_call
Here’s a complete example showing how a simple research agent loop works in practice:
web_search("Python 3.13 new features changelog")web_search("Python 3.12 major features")Each step is a turn through the agent loop. The total number of turns was not predetermined — the agent decided dynamically how many iterations it needed.
Navigation: