Chapter 9 – Parallelization Pattern

Simultaneous Execution for Speed and Quality

Parallelization is the pattern of running multiple LLM calls simultaneously and combining their outputs. Unlike prompt chaining (sequential) or routing (one path), parallelization harnesses concurrent execution to reduce latency, improve quality, or increase confidence.

Two Variants

Anthropic identifies two key forms of parallelization:

Sectioning

Breaking a task into independent subtasks that can run in parallel:

                    ┌──────────┐
                    │  Input   │
                    └──┬──┬──┬─┘
                       │  │  │
              ┌────────┘  │  └────────┐
              │           │           │
        ┌─────▼────┐ ┌───▼─────┐ ┌───▼─────┐
        │ Section A│ │Section B│ │Section C│
        │  (LLM)   │ │  (LLM)  │ │  (LLM)  │
        └─────┬────┘ └───┬─────┘ └───┬─────┘
              │           │           │
              └────────┐  │  ┌────────┘
                       │  │  │
                    ┌──▼──▼──▼─┐
                    │ Aggregate│
                    └──────────┘

Voting

Running the same task multiple times to get diverse outputs and selecting the best:

                    ┌──────────┐
                    │  Input   │
                    └──┬──┬──┬─┘
                       │  │  │
              ┌────────┘  │  └────────┐
              │           │           │
        ┌─────▼────┐ ┌───▼─────┐ ┌───▼─────┐
        │  Run 1   │ │  Run 2  │ │  Run 3  │
        │  (LLM)   │ │  (LLM)  │ │  (LLM)  │
        └─────┬────┘ └───┬─────┘ └───┬─────┘
              │           │           │
              └────────┐  │  ┌────────┘
                       │  │  │
                    ┌──▼──▼──▼─┐
                    │   Vote   │
                    └──────────┘

Sectioning: Divide and Conquer

Sectioning works when the task can be split into independent subtasks that don’t depend on each other.

# See code/parallelization.py for the full implementation
import asyncio

async def parallel_sections(llm, task, sections):
    """Execute independent sections in parallel."""
    
    async def run_section(section_prompt):
        return await llm.agenerate(section_prompt)
    
    tasks = [run_section(s) for s in sections]
    results = await asyncio.gather(*tasks)
    
    # Aggregate results
    combined = "\n\n".join(results)
    final = await llm.agenerate(
        f"Combine these sections into a coherent whole:\n{combined}"
    )
    return final


# Example: Analyze a company from multiple angles simultaneously
sections = [
    "Analyze the financial performance of Company X based on...",
    "Analyze the competitive landscape for Company X...",
    "Analyze the management team and corporate culture of Company X...",
    "Analyze the technology and product roadmap of Company X...",
]

result = asyncio.run(parallel_sections(llm, task, sections))

Guardrails as Sectioning

A particularly important application of sectioning: running guardrails in parallel with the main task. While one LLM call generates the response, another screens for safety issues:

# See code/parallelization.py for the full implementation

async def guarded_response(llm, user_query):
    """Generate response and check safety in parallel."""
    
    async def generate():
        return await llm.agenerate(
            f"Answer this query helpfully:\n{user_query}"
        )
    
    async def check_safety():
        return await llm.agenerate(
            f"Is this query requesting anything harmful, illegal, "
            f"or inappropriate? Query: {user_query}\n"
            f"Answer YES or NO with brief explanation.",
            temperature=0
        )
    
    response, safety = await asyncio.gather(generate(), check_safety())
    
    if "YES" in safety.upper():
        return "I'm sorry, I can't help with that request."
    return response

This is faster than sequential checking (check first, then respond) and more reliable than asking a single LLM to both guard and respond.

Voting: Strength in Numbers

Voting runs the same task multiple times (often with different temperatures, prompts, or models) and selects the best answer through consensus.

# See code/parallelization.py for the full implementation

async def voting(llm, task, num_votes=5, temperature=0.7):
    """Run task multiple times and take majority vote."""
    
    tasks = [
        llm.agenerate(task, temperature=temperature)
        for _ in range(num_votes)
    ]
    responses = await asyncio.gather(*tasks)
    
    # For classification tasks: majority vote
    from collections import Counter
    votes = Counter(r.strip() for r in responses)
    winner, count = votes.most_common(1)[0]
    
    confidence = count / num_votes
    return winner, confidence

Best-of-N Sampling

A variant where you generate N responses and use a separate evaluation step to pick the best one:

# See code/parallelization.py for the full implementation

async def best_of_n(llm, task, n=3):
    """Generate N responses and pick the best."""
    
    # Generate N candidates
    tasks = [llm.agenerate(task, temperature=0.8) for _ in range(n)]
    candidates = await asyncio.gather(*tasks)
    
    # Evaluate and pick the best
    evaluation_prompt = (
        f"Task: {task}\n\n"
        f"Here are {n} candidate responses:\n\n"
    )
    for i, c in enumerate(candidates):
        evaluation_prompt += f"--- Candidate {i+1} ---\n{c}\n\n"
    
    evaluation_prompt += (
        f"Which candidate best accomplishes the task? "
        f"Consider accuracy, completeness, and clarity. "
        f"Respond with just the number."
    )
    
    best_idx = int(await llm.agenerate(evaluation_prompt, temperature=0)) - 1
    return candidates[best_idx]

Parallel Evaluation

Parallelization is particularly useful for evaluating LLM outputs across multiple dimensions simultaneously:

# See code/parallelization.py for the full implementation

async def parallel_evaluation(llm, content, criteria):
    """Evaluate content across multiple criteria in parallel."""
    
    async def evaluate_criterion(criterion):
        return await llm.agenerate(
            f"Evaluate this content on '{criterion}' "
            f"(score 1-10 with explanation):\n\n{content}"
        )
    
    results = await asyncio.gather(
        *[evaluate_criterion(c) for c in criteria]
    )
    
    return dict(zip(criteria, results))


criteria = ["accuracy", "clarity", "completeness", "tone"]
scores = asyncio.run(parallel_evaluation(llm, article, criteria))

When to Use Parallelization

Sectioning is effective when:

Voting is effective when:

When to Avoid

Practical Tips

  1. Use async/await — Python’s asyncio is the natural fit for parallel LLM calls
  2. Set timeouts — One slow call shouldn’t block the entire batch
  3. Handle partial failures — If one parallel call fails, decide whether to retry, skip, or fail entirely
  4. Monitor costs — N parallel calls cost N times as much as a single call
  5. Start with sectioning — It’s more predictable than voting and often provides the most benefit
  6. Combine with other patterns — Parallelization works well alongside routing (route then handle in parallel) and prompt chaining (parallelize independent steps)

Navigation: