Chapter 12 – Agent Memory and State

Remembering Across Time and Tasks

An agent without memory is like a brilliant expert with complete amnesia — it can reason brilliantly within a single conversation but forgets everything the moment the interaction ends. Memory and state management are what transform stateless LLM calls into persistent, learning agents.

Types of Agent Memory

    ┌─────────────────────────────────────────────────┐
    │                Agent Memory System              │
    │                                                 │
    │  ┌──────────────┐  ┌────────────────────────┐   │
    │  │  Short-Term   │  │      Long-Term         │   │
    │  │  (Context     │  │  ┌─────────────────┐   │   │
    │  │   Window)     │  │  │    Episodic      │   │   │
    │  │               │  │  │ (Past sessions)  │   │   │
    │  └──────────────┘  │  └─────────────────┘   │   │
    │                     │  ┌─────────────────┐   │   │
    │  ┌──────────────┐  │  │    Semantic      │   │   │
    │  │  Working      │  │  │ (Facts, prefs)  │   │   │
    │  │  (Scratchpad) │  │  └─────────────────┘   │   │
    │  │               │  │  ┌─────────────────┐   │   │
    │  └──────────────┘  │  │   Procedural     │   │   │
    │                     │  │ (How-to, skills) │   │   │
    │                     │  └─────────────────┘   │   │
    │                     └────────────────────────┘   │
    └─────────────────────────────────────────────────┘

Short-Term Memory: The Context Window

The conversation history within the current LLM context window. This is the most fundamental form of memory — every message in the conversation (user turns, assistant turns, tool results) is included in each request to the model.

Challenge: Context windows are finite. A 200K-token window sounds large, but it fills up quickly when tool results, code files, and conversation history accumulate.

Working Memory: The Scratchpad

A temporary store where the agent keeps track of intermediate results, plans, and observations during a single task execution:

# See code/memory.py for the full implementation

class WorkingMemory:
    """Scratchpad for current task execution."""
    
    def __init__(self):
        self.plan = None
        self.observations = []
        self.intermediate_results = {}
        self.current_step = 0
    
    def note(self, key, value):
        """Store an intermediate result."""
        self.intermediate_results[key] = value
    
    def observe(self, observation):
        """Record an observation from the environment."""
        self.observations.append({
            "step": self.current_step,
            "content": observation
        })
    
    def summarize(self):
        """Get a summary for inclusion in the LLM context."""
        return {
            "plan": self.plan,
            "completed_steps": self.current_step,
            "key_findings": self.intermediate_results,
            "recent_observations": self.observations[-5:]
        }

Episodic Memory: Past Experiences

Records of complete past interactions — what was asked, what the agent did, what worked, and what didn’t:

# See code/memory.py for the full implementation

class EpisodicMemory:
    """Store and retrieve past interaction episodes."""
    
    def __init__(self, vector_store):
        self.store = vector_store
    
    def record_episode(self, task, actions, outcome, success):
        episode = {
            "task": task,
            "actions": actions,
            "outcome": outcome,
            "success": success,
            "timestamp": datetime.now().isoformat()
        }
        self.store.add(
            text=f"Task: {task}\nOutcome: {outcome}\nSuccess: {success}",
            metadata=episode
        )
    
    def recall_similar(self, current_task, k=3):
        """Find past episodes similar to the current task."""
        return self.store.search(current_task, k=k)

Semantic Memory: Facts and Knowledge

Accumulated facts, user preferences, and domain knowledge:

# See code/memory.py for the full implementation

class SemanticMemory:
    """Store and retrieve facts and knowledge."""
    
    def __init__(self, vector_store):
        self.store = vector_store
        self.facts = {}  # Key-value for quick lookups
    
    def learn(self, fact, category=None):
        self.store.add(text=fact, metadata={"category": category})
    
    def learn_fact(self, key, value):
        self.facts[key] = value
    
    def recall(self, query, k=5):
        return self.store.search(query, k=k)
    
    def get_fact(self, key):
        return self.facts.get(key)

Procedural Memory: Skills and Strategies

Learned approaches and strategies that worked in the past:

class ProceduralMemory:
    """Store successful strategies and approaches."""
    
    def __init__(self):
        self.strategies = {}
    
    def record_strategy(self, task_type, strategy, success_rate):
        if task_type not in self.strategies:
            self.strategies[task_type] = []
        self.strategies[task_type].append({
            "strategy": strategy,
            "success_rate": success_rate
        })
    
    def best_strategy(self, task_type):
        if task_type in self.strategies:
            return max(self.strategies[task_type], 
                      key=lambda s: s["success_rate"])
        return None

Context Window Management

The most immediate challenge in agent memory is managing the context window. As conversations grow, you need strategies to keep the most relevant information within the token budget.

Sliding Window

Keep only the N most recent messages:

def sliding_window(messages, max_messages=20):
    """Keep the system prompt and the N most recent messages."""
    system = [m for m in messages if m["role"] == "system"]
    others = [m for m in messages if m["role"] != "system"]
    return system + others[-max_messages:]

Summarization

Periodically summarize older messages into a compact representation:

# See code/memory.py for the full implementation

def summarize_context(llm, messages, keep_recent=5):
    """Summarize older messages to fit context window."""
    system = [m for m in messages if m["role"] == "system"]
    others = [m for m in messages if m["role"] != "system"]
    
    if len(others) <= keep_recent:
        return messages
    
    old_messages = others[:-keep_recent]
    recent_messages = others[-keep_recent:]
    
    summary = llm.generate(
        f"Summarize this conversation history, preserving key "
        f"facts, decisions, and context:\n\n"
        f"{format_messages(old_messages)}"
    )
    
    return system + [
        {"role": "system", "content": f"Previous conversation summary:\n{summary}"}
    ] + recent_messages

Selective Retrieval

Store all messages in a vector database and retrieve only the most relevant ones:

# See code/memory.py for the full implementation

class SelectiveMemory:
    def __init__(self, vector_store):
        self.store = vector_store
        self.all_messages = []
    
    def add_message(self, message):
        self.all_messages.append(message)
        self.store.add(
            text=message["content"],
            metadata={"index": len(self.all_messages) - 1}
        )
    
    def get_relevant_context(self, query, k=10):
        """Retrieve the most relevant past messages for a query."""
        results = self.store.search(query, k=k)
        indices = [r.metadata["index"] for r in results]
        return [self.all_messages[i] for i in sorted(indices)]

State Management Patterns

Conversation State

Tracking where the user is in a multi-step process:

class ConversationState:
    def __init__(self):
        self.stage = "initial"
        self.collected_info = {}
        self.pending_actions = []
    
    def transition(self, new_stage, **kwargs):
        self.stage = new_stage
        self.collected_info.update(kwargs)
    
    def to_context(self):
        return (
            f"Current stage: {self.stage}\n"
            f"Information collected: {self.collected_info}\n"
            f"Pending actions: {self.pending_actions}"
        )

Task State

Tracking progress within a complex task:

class TaskState:
    def __init__(self, goal):
        self.goal = goal
        self.plan = []
        self.completed_steps = []
        self.current_step = None
        self.artifacts = {}  # Files, data, etc.
    
    def checkpoint(self):
        """Create a serializable checkpoint."""
        return {
            "goal": self.goal,
            "plan": self.plan,
            "completed": self.completed_steps,
            "current": self.current_step,
            "artifacts": self.artifacts
        }
    
    def restore(self, checkpoint):
        """Restore from a checkpoint."""
        self.goal = checkpoint["goal"]
        self.plan = checkpoint["plan"]
        self.completed_steps = checkpoint["completed"]
        self.current_step = checkpoint["current"]
        self.artifacts = checkpoint["artifacts"]

Persistent Memory Storage

For agents that need to remember across sessions, several storage backends can be used:

Storage Type Best For Limitations
Vector Database (Pinecone, Chroma, Qdrant) Semantic search over memories Requires embedding model
Key-Value Store (Redis) Fast fact lookup No semantic search
Relational Database (PostgreSQL) Structured data, complex queries Rigid schema
Graph Database (Neo4j) Relationship-rich knowledge Complex setup
File System Simple persistence No search capability

When to Invest in Memory

Memory adds complexity. Use it when:

Avoid investing in memory when:

Practical Tips

  1. Start without long-term memory — Simple conversation history is sufficient for many use cases
  2. Summarize before storing — Store distilled insights, not raw conversation logs
  3. Prune aggressively — Old, irrelevant memories add noise
  4. Test memory retrieval quality — The value of memory depends on retrieving the right memories at the right time
  5. Consider privacy — Persistent memory creates data retention obligations
  6. Version your memory schema — As your agent evolves, its memory structure will too

Navigation: