- My professional website

An agent without memory is like a brilliant expert with complete amnesia — it can reason brilliantly within a single conversation but forgets everything the moment the interaction ends. Memory and state management are what transform stateless LLM calls into persistent, learning agents.

Types of Agent Memory

    ┌─────────────────────────────────────────────────┐
    │                Agent Memory System              │
    │                                                 │
    │  ┌──────────────┐  ┌────────────────────────┐   │
    │  │  Short-Term   │  │      Long-Term         │   │
    │  │  (Context     │  │  ┌─────────────────┐   │   │
    │  │   Window)     │  │  │    Episodic      │   │   │
    │  │               │  │  │ (Past sessions)  │   │   │
    │  └──────────────┘  │  └─────────────────┘   │   │
    │                     │  ┌─────────────────┐   │   │
    │  ┌──────────────┐  │  │    Semantic      │   │   │
    │  │  Working      │  │  │ (Facts, prefs)  │   │   │
    │  │  (Scratchpad) │  │  └─────────────────┘   │   │
    │  │               │  │  ┌─────────────────┐   │   │
    │  └──────────────┘  │  │   Procedural     │   │   │
    │                     │  │ (How-to, skills) │   │   │
    │                     │  └─────────────────┘   │   │
    │                     └────────────────────────┘   │
    └─────────────────────────────────────────────────┘

Short-Term Memory: The Context Window

The conversation history within the current LLM context window. This is the most fundamental form of memory — every message in the conversation (user turns, assistant turns, tool results) is included in each request to the model.

Challenge: Context windows are finite. A 200K-token window sounds large, but it fills up quickly when tool results, code files, and conversation history accumulate.

Working Memory: The Scratchpad

A temporary store where the agent keeps track of intermediate results, plans, and observations during a single task execution:

# See code/memory.py for the full implementation

class WorkingMemory:
    """Scratchpad for current task execution."""
    
    def __init__(self):
        self.plan = None
        self.observations = []
        self.intermediate_results = {}
        self.current_step = 0
    
    def note(self, key, value):
        """Store an intermediate result."""
        self.intermediate_results[key] = value
    
    def observe(self, observation):
        """Record an observation from the environment."""
        self.observations.append({
            "step": self.current_step,
            "content": observation
        })
    
    def summarize(self):
        """Get a summary for inclusion in the LLM context."""
        return {
            "plan": self.plan,
            "completed_steps": self.current_step,
            "key_findings": self.intermediate_results,
            "recent_observations": self.observations[-5:]
        }

Episodic Memory: Past Experiences

Records of complete past interactions — what was asked, what the agent did, what worked, and what didn’t:

# See code/memory.py for the full implementation

class EpisodicMemory:
    """Store and retrieve past interaction episodes."""
    
    def __init__(self, vector_store):
        self.store = vector_store
    
    def record_episode(self, task, actions, outcome, success):
        episode = {
            "task": task,
            "actions": actions,
            "outcome": outcome,
            "success": success,
            "timestamp": datetime.now().isoformat()
        }
        self.store.add(
            text=f"Task: {task}\nOutcome: {outcome}\nSuccess: {success}",
            metadata=episode
        )
    
    def recall_similar(self, current_task, k=3):
        """Find past episodes similar to the current task."""
        return self.store.search(current_task, k=k)

Semantic Memory: Facts and Knowledge

# See code/memory.py for the full implementation

class SemanticMemory:
    """Store and retrieve facts and knowledge."""
    
    def __init__(self, vector_store):
        self.store = vector_store
        self.facts = {}  # Key-value for quick lookups
    
    def learn(self, fact, category=None):
        self.store.add(text=fact, metadata={"category": category})
    
    def learn_fact(self, key, value):
        self.facts[key] = value
    
    def recall(self, query, k=5):
        return self.store.search(query, k=k)
    
    def get_fact(self, key):
        return self.facts.get(key)

Procedural Memory: Skills and Strategies

class ProceduralMemory:
    """Store successful strategies and approaches."""
    
    def __init__(self):
        self.strategies = {}
    
    def record_strategy(self, task_type, strategy, success_rate):
        if task_type not in self.strategies:
            self.strategies[task_type] = []
        self.strategies[task_type].append({
            "strategy": strategy,
            "success_rate": success_rate
        })
    
    def best_strategy(self, task_type):
        if task_type in self.strategies:
            return max(self.strategies[task_type], 
                      key=lambda s: s["success_rate"])
        return None

Context Window Management

The most immediate challenge in agent memory is managing the context window. As conversations grow, you need strategies to keep the most relevant information within the token budget.

Sliding Window

def sliding_window(messages, max_messages=20):
    """Keep the system prompt and the N most recent messages."""
    system = [m for m in messages if m["role"] == "system"]
    others = [m for m in messages if m["role"] != "system"]
    return system + others[-max_messages:]

Summarization

# See code/memory.py for the full implementation

def summarize_context(llm, messages, keep_recent=5):
    """Summarize older messages to fit context window."""
    system = [m for m in messages if m["role"] == "system"]
    others = [m for m in messages if m["role"] != "system"]
    
    if len(others) <= keep_recent:
        return messages
    
    old_messages = others[:-keep_recent]
    recent_messages = others[-keep_recent:]
    
    summary = llm.generate(
        f"Summarize this conversation history, preserving key "
        f"facts, decisions, and context:\n\n"
        f"{format_messages(old_messages)}"
    )
    
    return system + [
        {"role": "system", "content": f"Previous conversation summary:\n{summary}"}
    ] + recent_messages

Selective Retrieval

Store all messages in a vector database and retrieve only the most relevant ones:

# See code/memory.py for the full implementation

class SelectiveMemory:
    def __init__(self, vector_store):
        self.store = vector_store
        self.all_messages = []
    
    def add_message(self, message):
        self.all_messages.append(message)
        self.store.add(
            text=message["content"],
            metadata={"index": len(self.all_messages) - 1}
        )
    
    def get_relevant_context(self, query, k=10):
        """Retrieve the most relevant past messages for a query."""
        results = self.store.search(query, k=k)
        indices = [r.metadata["index"] for r in results]
        return [self.all_messages[i] for i in sorted(indices)]

State Management Patterns

Conversation State

class ConversationState:
    def __init__(self):
        self.stage = "initial"
        self.collected_info = {}
        self.pending_actions = []
    
    def transition(self, new_stage, **kwargs):
        self.stage = new_stage
        self.collected_info.update(kwargs)
    
    def to_context(self):
        return (
            f"Current stage: {self.stage}\n"
            f"Information collected: {self.collected_info}\n"
            f"Pending actions: {self.pending_actions}"
        )

Task State

class TaskState:
    def __init__(self, goal):
        self.goal = goal
        self.plan = []
        self.completed_steps = []
        self.current_step = None
        self.artifacts = {}  # Files, data, etc.
    
    def checkpoint(self):
        """Create a serializable checkpoint."""
        return {
            "goal": self.goal,
            "plan": self.plan,
            "completed": self.completed_steps,
            "current": self.current_step,
            "artifacts": self.artifacts
        }
    
    def restore(self, checkpoint):
        """Restore from a checkpoint."""
        self.goal = checkpoint["goal"]
        self.plan = checkpoint["plan"]
        self.completed_steps = checkpoint["completed"]
        self.current_step = checkpoint["current"]
        self.artifacts = checkpoint["artifacts"]

Persistent Memory Storage

For agents that need to remember across sessions, several storage backends can be used:

Storage Type	Best For	Limitations
Vector Database (Pinecone, Chroma, Qdrant)	Semantic search over memories	Requires embedding model
Key-Value Store (Redis)	Fast fact lookup	No semantic search
Relational Database (PostgreSQL)	Structured data, complex queries	Rigid schema
Graph Database (Neo4j)	Relationship-rich knowledge	Complex setup
File System	Simple persistence	No search capability

Chapter 12 – Agent Memory and State

Remembering Across Time and Tasks