An agent without memory is like a brilliant expert with complete amnesia — it can reason brilliantly within a single conversation but forgets everything the moment the interaction ends. Memory and state management are what transform stateless LLM calls into persistent, learning agents.
┌─────────────────────────────────────────────────┐
│ Agent Memory System │
│ │
│ ┌──────────────┐ ┌────────────────────────┐ │
│ │ Short-Term │ │ Long-Term │ │
│ │ (Context │ │ ┌─────────────────┐ │ │
│ │ Window) │ │ │ Episodic │ │ │
│ │ │ │ │ (Past sessions) │ │ │
│ └──────────────┘ │ └─────────────────┘ │ │
│ │ ┌─────────────────┐ │ │
│ ┌──────────────┐ │ │ Semantic │ │ │
│ │ Working │ │ │ (Facts, prefs) │ │ │
│ │ (Scratchpad) │ │ └─────────────────┘ │ │
│ │ │ │ ┌─────────────────┐ │ │
│ └──────────────┘ │ │ Procedural │ │ │
│ │ │ (How-to, skills) │ │ │
│ │ └─────────────────┘ │ │
│ └────────────────────────┘ │
└─────────────────────────────────────────────────┘
The conversation history within the current LLM context window. This is the most fundamental form of memory — every message in the conversation (user turns, assistant turns, tool results) is included in each request to the model.
Challenge: Context windows are finite. A 200K-token window sounds large, but it fills up quickly when tool results, code files, and conversation history accumulate.
A temporary store where the agent keeps track of intermediate results, plans, and observations during a single task execution:
# See code/memory.py for the full implementation
class WorkingMemory:
"""Scratchpad for current task execution."""
def __init__(self):
self.plan = None
self.observations = []
self.intermediate_results = {}
self.current_step = 0
def note(self, key, value):
"""Store an intermediate result."""
self.intermediate_results[key] = value
def observe(self, observation):
"""Record an observation from the environment."""
self.observations.append({
"step": self.current_step,
"content": observation
})
def summarize(self):
"""Get a summary for inclusion in the LLM context."""
return {
"plan": self.plan,
"completed_steps": self.current_step,
"key_findings": self.intermediate_results,
"recent_observations": self.observations[-5:]
}
Records of complete past interactions — what was asked, what the agent did, what worked, and what didn’t:
# See code/memory.py for the full implementation
class EpisodicMemory:
"""Store and retrieve past interaction episodes."""
def __init__(self, vector_store):
self.store = vector_store
def record_episode(self, task, actions, outcome, success):
episode = {
"task": task,
"actions": actions,
"outcome": outcome,
"success": success,
"timestamp": datetime.now().isoformat()
}
self.store.add(
text=f"Task: {task}\nOutcome: {outcome}\nSuccess: {success}",
metadata=episode
)
def recall_similar(self, current_task, k=3):
"""Find past episodes similar to the current task."""
return self.store.search(current_task, k=k)
Accumulated facts, user preferences, and domain knowledge:
# See code/memory.py for the full implementation
class SemanticMemory:
"""Store and retrieve facts and knowledge."""
def __init__(self, vector_store):
self.store = vector_store
self.facts = {} # Key-value for quick lookups
def learn(self, fact, category=None):
self.store.add(text=fact, metadata={"category": category})
def learn_fact(self, key, value):
self.facts[key] = value
def recall(self, query, k=5):
return self.store.search(query, k=k)
def get_fact(self, key):
return self.facts.get(key)
Learned approaches and strategies that worked in the past:
class ProceduralMemory:
"""Store successful strategies and approaches."""
def __init__(self):
self.strategies = {}
def record_strategy(self, task_type, strategy, success_rate):
if task_type not in self.strategies:
self.strategies[task_type] = []
self.strategies[task_type].append({
"strategy": strategy,
"success_rate": success_rate
})
def best_strategy(self, task_type):
if task_type in self.strategies:
return max(self.strategies[task_type],
key=lambda s: s["success_rate"])
return None
The most immediate challenge in agent memory is managing the context window. As conversations grow, you need strategies to keep the most relevant information within the token budget.
Keep only the N most recent messages:
def sliding_window(messages, max_messages=20):
"""Keep the system prompt and the N most recent messages."""
system = [m for m in messages if m["role"] == "system"]
others = [m for m in messages if m["role"] != "system"]
return system + others[-max_messages:]
Periodically summarize older messages into a compact representation:
# See code/memory.py for the full implementation
def summarize_context(llm, messages, keep_recent=5):
"""Summarize older messages to fit context window."""
system = [m for m in messages if m["role"] == "system"]
others = [m for m in messages if m["role"] != "system"]
if len(others) <= keep_recent:
return messages
old_messages = others[:-keep_recent]
recent_messages = others[-keep_recent:]
summary = llm.generate(
f"Summarize this conversation history, preserving key "
f"facts, decisions, and context:\n\n"
f"{format_messages(old_messages)}"
)
return system + [
{"role": "system", "content": f"Previous conversation summary:\n{summary}"}
] + recent_messages
Store all messages in a vector database and retrieve only the most relevant ones:
# See code/memory.py for the full implementation
class SelectiveMemory:
def __init__(self, vector_store):
self.store = vector_store
self.all_messages = []
def add_message(self, message):
self.all_messages.append(message)
self.store.add(
text=message["content"],
metadata={"index": len(self.all_messages) - 1}
)
def get_relevant_context(self, query, k=10):
"""Retrieve the most relevant past messages for a query."""
results = self.store.search(query, k=k)
indices = [r.metadata["index"] for r in results]
return [self.all_messages[i] for i in sorted(indices)]
Tracking where the user is in a multi-step process:
class ConversationState:
def __init__(self):
self.stage = "initial"
self.collected_info = {}
self.pending_actions = []
def transition(self, new_stage, **kwargs):
self.stage = new_stage
self.collected_info.update(kwargs)
def to_context(self):
return (
f"Current stage: {self.stage}\n"
f"Information collected: {self.collected_info}\n"
f"Pending actions: {self.pending_actions}"
)
Tracking progress within a complex task:
class TaskState:
def __init__(self, goal):
self.goal = goal
self.plan = []
self.completed_steps = []
self.current_step = None
self.artifacts = {} # Files, data, etc.
def checkpoint(self):
"""Create a serializable checkpoint."""
return {
"goal": self.goal,
"plan": self.plan,
"completed": self.completed_steps,
"current": self.current_step,
"artifacts": self.artifacts
}
def restore(self, checkpoint):
"""Restore from a checkpoint."""
self.goal = checkpoint["goal"]
self.plan = checkpoint["plan"]
self.completed_steps = checkpoint["completed"]
self.current_step = checkpoint["current"]
self.artifacts = checkpoint["artifacts"]
For agents that need to remember across sessions, several storage backends can be used:
| Storage Type | Best For | Limitations |
|---|---|---|
| Vector Database (Pinecone, Chroma, Qdrant) | Semantic search over memories | Requires embedding model |
| Key-Value Store (Redis) | Fast fact lookup | No semantic search |
| Relational Database (PostgreSQL) | Structured data, complex queries | Rigid schema |
| Graph Database (Neo4j) | Relationship-rich knowledge | Complex setup |
| File System | Simple persistence | No search capability |
Memory adds complexity. Use it when:
Avoid investing in memory when:
Navigation: