Before we can build agents that plan, reflect, and collaborate, we need to understand the fundamental building block from which all agentic systems are constructed: the augmented LLM.
A bare LLM takes text in and produces text out. It’s powerful, but limited — it can’t browse the web, execute code, query a database, or remember what happened three conversations ago. An augmented LLM extends the base model with three key capabilities:
┌──────────────────────────────────────┐
│ Augmented LLM │
│ │
│ ┌──────────┐ ┌───────────────┐ │
│ │ Retrieval │ │ Tools │ │
│ │ (RAG, │ │ (APIs, code, │ │
│ │ search) │ │ web search) │ │
│ └──────────┘ └───────────────┘ │
│ ┌──────────────┐ │
│ │ Memory │ │
│ │ (short/long │ │
│ │ term) │ │
│ └──────────────┘ │
│ │
│ ┌──────────────┐ │
│ │ Base LLM │ │
│ └──────────────┘ │
└──────────────────────────────────────┘
LLMs are trained on static datasets with a knowledge cutoff date. Retrieval augmentation gives them access to current, domain-specific, or private information at inference time.
The most common approach is Retrieval-Augmented Generation (RAG):
Modern augmented LLMs can actively drive their own retrieval — generating search queries, evaluating results, and deciding whether to search again with refined terms. This moves retrieval from a passive preprocessing step to an active tool the model wields.
# See code/augmented_llm_retrieval.py for the full implementation
# Simplified RAG pipeline
query = "What are the best practices for agent tool design?"
results = vector_store.similarity_search(query, k=3)
context = "\n".join([doc.content for doc in results])
response = llm.generate(
f"Context:\n{context}\n\nQuestion: {query}\nAnswer:"
)
Tools are functions that the LLM can request to call. When the LLM needs to perform an action or gather information it cannot produce from its training data, it generates a structured request (typically JSON) specifying which tool to call and with what arguments.
A tool is defined by:
# See code/augmented_llm_tools.py for the full implementation
tools = [
{
"name": "web_search",
"description": "Search the web for current information. Use when the user asks about recent events or needs up-to-date data.",
"parameters": {
"query": {"type": "string", "description": "The search query"}
}
},
{
"name": "calculator",
"description": "Perform mathematical calculations. Use for any arithmetic, financial, or scientific computation.",
"parameters": {
"expression": {"type": "string", "description": "The math expression to evaluate"}
}
}
]
The LLM doesn’t execute the tools itself — it generates a request, and the runtime infrastructure handles execution and returns the result. This separation is fundamental to agent safety and control.
Memory gives agents the ability to retain and recall information. There are several types:
The most basic form of memory is the conversation history that fits within the model’s context window. Every message in the conversation is included in each new request to the LLM.
Limitation: Context windows are finite (typically 4K to 200K tokens). As conversations grow, older messages must be truncated or summarized.
A scratchpad where the agent stores intermediate results, plans, and observations during a single task execution. This is typically implemented as a structured object or text buffer that the agent can read from and write to.
Information that persists across sessions:
Long-term memory is typically implemented with vector databases, key-value stores, or structured databases.
# See code/augmented_llm_memory.py for the full implementation
class AgentMemory:
def __init__(self):
self.short_term = [] # Conversation history
self.working = {} # Current task scratchpad
self.long_term = VectorDB() # Persistent knowledge
def remember(self, key, value):
self.working[key] = value
self.long_term.store(key, value)
def recall(self, query, k=5):
return self.long_term.search(query, k=k)
One of the most significant recent developments in the augmented LLM space is Anthropic’s Model Context Protocol (MCP) — an open standard for connecting LLMs to external data sources and tools.
MCP provides a standardized way to:
Rather than building custom integrations for every tool, developers can implement an MCP server once and connect it to any MCP-compatible client.
When building the augmented LLM layer for your agent, focus on two principles:
Tailor capabilities to your use case — Don’t give an agent tools it doesn’t need. A document summarization agent doesn’t need a code executor. A coding agent doesn’t need a weather API.
Provide clear, well-documented interfaces — The LLM will only use tools effectively if it understands what they do. Tool descriptions are prompts — invest in writing them well.
The augmented LLM is the atom from which all agentic molecules are built. Every pattern in the following chapters — from simple prompt chaining to multi-agent orchestration — is ultimately composed of augmented LLM calls.
Navigation: