Chapter 6 – Multi-Agent Collaboration

A Team of Specialized AI Workers

Multi-agent collaboration is the pattern of having multiple AI agents — each with its own role, expertise, tools, and instructions — work together to accomplish tasks that are too complex or multifaceted for a single agent.

This is the most ambitious of the four core design patterns, and also the most variable in its outcomes. When it works, multi-agent systems produce results that far exceed what any single agent can achieve. When it doesn’t, the agents can produce a “cacophony” of conflicting messages and wasted computation.

Why Multiple Agents?

It might seem counterintuitive that prompting the same LLM with different role descriptions should produce better results than a single, comprehensive prompt. Yet research and practice consistently show that it does, for several reasons:

It works — Ablation studies in the AutoGen paper show that multiple agents give superior performance to a single agent on complex tasks
Focused attention — Even though modern LLMs can process long contexts, their ability to truly attend to everything in a long, complex prompt is limited. An agent focused on one aspect of a task performs better than one juggling everything
Optimized prompts — Each agent’s prompt can be tailored to its specific subtask, emphasizing the criteria most relevant to that role
Decomposition framework — Multi-agent design gives developers a natural framework for breaking down complex tasks, much like assigning work to a team of specialists

Multi-Agent Architectures

Sequential Pipeline

Agents are arranged in a chain, where each agent’s output becomes the next agent’s input:

  Researcher ──► Writer ──► Editor ──► Formatter

# See code/multi_agent.py for the full implementation

def sequential_pipeline(task, agents):
    """Execute agents in sequence, each building on the previous output."""
    current_input = task
    for agent in agents:
        current_input = agent.execute(current_input)
    return current_input

Debate / Discussion

Two or more agents discuss a topic, presenting different perspectives and converging on an answer:

  Agent A ──► Agent B ──► Agent A ──► Agent B ──► Consensus
  (argues     (counter-   (refines    (agrees
   position)   argues)     position)   or refines)

# See code/multi_agent.py for the full implementation

def debate(llm, topic, perspectives, max_rounds=3):
    """Two agents debate a topic from different perspectives."""
    history = []
    
    agents = [
        {"role": p, "system": f"You argue from the perspective of: {p}"}
        for p in perspectives
    ]
    
    for round in range(max_rounds):
        for agent in agents:
            response = llm.generate(
                f"Topic: {topic}\n"
                f"Discussion so far:\n{format_history(history)}\n"
                f"Provide your perspective as {agent['role']}.",
                system=agent["system"]
            )
            history.append({"role": agent["role"], "content": response})
    
    # Synthesize
    return llm.generate(
        f"Synthesize the following debate into a balanced conclusion:\n"
        f"{format_history(history)}"
    )

Hierarchical (Manager-Workers)

A manager agent coordinates multiple worker agents, assigning tasks and integrating results:

                  ┌──────────┐
                  │ Manager  │
                  └────┬─────┘
             ┌─────────┼─────────┐
        ┌────▼───┐ ┌───▼────┐ ┌──▼─────┐
        │Worker 1│ │Worker 2│ │Worker 3│
        │(Code)  │ │(Test)  │ │(Docs)  │
        └────────┘ └────────┘ └────────┘

Collaborative Group

Multiple agents share a workspace and contribute asynchronously:

        ┌──────────────────────────────┐
        │     Shared Workspace         │
        │   ┌─────┐  ┌─────┐          │
        │   │Doc A│  │Doc B│   ...     │
        │   └─────┘  └─────┘          │
        └──┬──────┬──────┬─────────────┘
           │      │      │
      ┌────▼┐ ┌───▼──┐ ┌─▼────┐
      │Agent│ │Agent │ │Agent │
      │  A  │ │  B   │ │  C   │
      └─────┘ └──────┘ └──────┘

The ChatDev Model: A Virtual Software Company

One of the most vivid examples of multi-agent collaboration is ChatDev (Qian et al., 2023), which simulates an entire software company using LLM agents:

CEO: Receives the user’s software requirement
CTO: Makes technology decisions
Programmer: Writes the code
Art Designer: Creates visual assets
Tester: Tests the software and reports bugs

These agents communicate through structured dialogues, and the entire software development lifecycle — from requirements to testing — is carried out by the agent team.

Implementing Multi-Agent Systems

Defining Agents

# See code/multi_agent.py for the full implementation

class Agent:
    def __init__(self, name, role, goal, backstory, tools=None, llm=None):
        self.name = name
        self.role = role
        self.goal = goal
        self.backstory = backstory
        self.tools = tools or []
        self.llm = llm
    
    def execute(self, task, context=None):
        system_prompt = (
            f"You are {self.name}, a {self.role}.\n"
            f"Your goal: {self.goal}\n"
            f"Background: {self.backstory}"
        )
        
        full_prompt = f"Task: {task}"
        if context:
            full_prompt = f"Context:\n{context}\n\n{full_prompt}"
        
        return self.llm.generate(full_prompt, system=system_prompt, 
                                  tools=self.tools)

Orchestrating Collaboration

# See code/multi_agent.py for the full implementation

class MultiAgentSystem:
    def __init__(self, agents, strategy="sequential"):
        self.agents = {a.name: a for a in agents}
        self.strategy = strategy
        self.message_log = []
    
    def run(self, task):
        if self.strategy == "sequential":
            return self._run_sequential(task)
        elif self.strategy == "debate":
            return self._run_debate(task)
        elif self.strategy == "hierarchical":
            return self._run_hierarchical(task)
    
    def _run_sequential(self, task):
        result = task
        for agent in self.agents.values():
            result = agent.execute(result)
            self.message_log.append({
                "agent": agent.name,
                "input": task, 
                "output": result
            })
        return result

Agent Communication Patterns

Direct Messaging

Agents send messages directly to specific other agents:

manager.send(worker_1, "Please analyze the sales data for Q1")

Broadcast

An agent sends a message to all other agents:

coordinator.broadcast("New requirement: the report must include charts")

Shared Memory / Blackboard

Agents read and write to a shared workspace:

shared_state["research_results"] = researcher.execute(topic)
writer_output = writer.execute(shared_state["research_results"])

Event-Driven

Agents subscribe to events and react when relevant events occur:

@on_event("code_written")
def test_agent_handler(code):
    test_results = run_tests(code)
    emit_event("tests_completed", test_results)

When to Use Multi-Agent Collaboration

Good Fits

Complex software development — Different agents for coding, testing, reviewing, and documentation
Research synthesis — Multiple researcher agents investigate different aspects of a topic
Content creation pipelines — Writer, editor, fact-checker, and designer agents
Decision-making — Multiple agents argue different positions, leading to more balanced outcomes
Tasks requiring diverse expertise — Each agent can be equipped with different tools and knowledge

When to Avoid

Simple tasks — Multi-agent systems add significant complexity and cost
Latency-sensitive applications — Multiple LLM calls in sequence or parallel add up
Unpredictable behavior is unacceptable — Multi-agent interactions can be hard to predict
Limited budget — Each agent interaction consumes tokens

Practical Tips

Start with two agents — A generator and a critic (the reflection pattern) is the simplest multi-agent system. Add more agents only when you see clear benefit
Define clear roles — Vague role descriptions lead to agents stepping on each other’s toes
Limit interaction rounds — Set maximum rounds for debates and discussions
Monitor costs — Multi-agent systems can generate many LLM calls quickly
Use different model sizes — Not every agent needs the most powerful (and expensive) model. Use smaller models for simpler roles
Log everything — The message flow between agents is your primary debugging tool
Test with ablations — Remove individual agents to see if they’re actually contributing

Key Frameworks

Several frameworks make multi-agent systems easier to build:

CrewAI: Define agents with roles, goals, and backstories; organize them into crews with sequential or hierarchical processes
AutoGen (Microsoft): Event-driven multi-agent framework with support for code execution and human-in-the-loop
LangGraph: Build complex agent workflows as graphs with state management
MetaGPT: Multi-agent framework inspired by software engineering processes

We explore these frameworks in detail in Chapter 14.

Navigation: