Chapter 1 – The Augmented LLM

The Foundation of Every Agent

Before we can build agents that plan, reflect, and collaborate, we need to understand the fundamental building block from which all agentic systems are constructed: the augmented LLM.

A bare LLM takes text in and produces text out. It’s powerful, but limited — it can’t browse the web, execute code, query a database, or remember what happened three conversations ago. An augmented LLM extends the base model with three key capabilities:

Retrieval — Access to external knowledge (documents, databases, APIs)
Tools — The ability to call functions and take actions in the world
Memory — Persistence of information across interactions

            ┌──────────────────────────────────────┐
            │         Augmented LLM                │
            │                                      │
            │   ┌──────────┐  ┌───────────────┐    │
            │   │ Retrieval │  │    Tools       │   │
            │   │ (RAG,     │  │ (APIs, code,  │   │
            │   │  search)  │  │  web search)  │   │
            │   └──────────┘  └───────────────┘    │
            │         ┌──────────────┐             │
            │         │   Memory     │             │
            │         │ (short/long  │             │
            │         │   term)      │             │
            │         └──────────────┘             │
            │                                      │
            │         ┌──────────────┐             │
            │         │  Base LLM    │             │
            │         └──────────────┘             │
            └──────────────────────────────────────┘

Retrieval: Grounding the Model in Reality

LLMs are trained on static datasets with a knowledge cutoff date. Retrieval augmentation gives them access to current, domain-specific, or private information at inference time.

The most common approach is Retrieval-Augmented Generation (RAG):

A user query arrives
The system searches a knowledge base (vector database, search index, etc.) for relevant documents
The retrieved documents are included in the LLM’s context alongside the query
The LLM generates a response grounded in the retrieved information

Modern augmented LLMs can actively drive their own retrieval — generating search queries, evaluating results, and deciding whether to search again with refined terms. This moves retrieval from a passive preprocessing step to an active tool the model wields.

# See code/augmented_llm_retrieval.py for the full implementation

# Simplified RAG pipeline
query = "What are the best practices for agent tool design?"
results = vector_store.similarity_search(query, k=3)
context = "\n".join([doc.content for doc in results])
response = llm.generate(
    f"Context:\n{context}\n\nQuestion: {query}\nAnswer:"
)

Tools: Extending the Model’s Capabilities

Tools are functions that the LLM can request to call. When the LLM needs to perform an action or gather information it cannot produce from its training data, it generates a structured request (typically JSON) specifying which tool to call and with what arguments.

A tool is defined by:

Name — A clear, descriptive identifier
Description — What the tool does, when to use it, and what it returns
Parameters — A schema describing the expected inputs
Return type — What the tool produces

# See code/augmented_llm_tools.py for the full implementation

tools = [
    {
        "name": "web_search",
        "description": "Search the web for current information. Use when the user asks about recent events or needs up-to-date data.",
        "parameters": {
            "query": {"type": "string", "description": "The search query"}
        }
    },
    {
        "name": "calculator",
        "description": "Perform mathematical calculations. Use for any arithmetic, financial, or scientific computation.",
        "parameters": {
            "expression": {"type": "string", "description": "The math expression to evaluate"}
        }
    }
]

The LLM doesn’t execute the tools itself — it generates a request, and the runtime infrastructure handles execution and returns the result. This separation is fundamental to agent safety and control.

Memory: Maintaining State Across Interactions

Memory gives agents the ability to retain and recall information. There are several types:

Short-Term Memory (Context Window)

The most basic form of memory is the conversation history that fits within the model’s context window. Every message in the conversation is included in each new request to the LLM.

Limitation: Context windows are finite (typically 4K to 200K tokens). As conversations grow, older messages must be truncated or summarized.

Working Memory (Scratchpad)

A scratchpad where the agent stores intermediate results, plans, and observations during a single task execution. This is typically implemented as a structured object or text buffer that the agent can read from and write to.

Long-Term Memory (Persistent Store)

Information that persists across sessions:

Episodic memory: Records of past interactions and their outcomes
Semantic memory: Facts and knowledge learned from experience
Procedural memory: Learned strategies and approaches that worked well

Long-term memory is typically implemented with vector databases, key-value stores, or structured databases.

# See code/augmented_llm_memory.py for the full implementation

class AgentMemory:
    def __init__(self):
        self.short_term = []       # Conversation history
        self.working = {}          # Current task scratchpad
        self.long_term = VectorDB() # Persistent knowledge
    
    def remember(self, key, value):
        self.working[key] = value
        self.long_term.store(key, value)
    
    def recall(self, query, k=5):
        return self.long_term.search(query, k=k)

The Model Context Protocol (MCP)

One of the most significant recent developments in the augmented LLM space is Anthropic’s Model Context Protocol (MCP) — an open standard for connecting LLMs to external data sources and tools.

MCP provides a standardized way to:

Expose tools and data sources to LLMs through a common protocol
Build reusable integrations that work across different LLM providers
Compose multiple data sources and tools into a single agent

Rather than building custom integrations for every tool, developers can implement an MCP server once and connect it to any MCP-compatible client.

Designing the Augmented LLM

When building the augmented LLM layer for your agent, focus on two principles:

Tailor capabilities to your use case — Don’t give an agent tools it doesn’t need. A document summarization agent doesn’t need a code executor. A coding agent doesn’t need a weather API.
Provide clear, well-documented interfaces — The LLM will only use tools effectively if it understands what they do. Tool descriptions are prompts — invest in writing them well.

The augmented LLM is the atom from which all agentic molecules are built. Every pattern in the following chapters — from simple prompt chaining to multi-agent orchestration — is ultimately composed of augmented LLM calls.

Navigation:

Previous: Chapter 0 – Introduction to Agentic Programming
Next: Chapter 2 – The Agent Loop
Table of Contents