Chapter 8 – Routing Pattern

Directing Inputs to Specialized Handlers

Routing classifies an input and directs it to a specialized follow-up process. It’s the pattern you use when different types of inputs require fundamentally different handling — different prompts, different tools, different models, or even different workflows.

The Pattern

                        ┌──────────────┐
                        │   Input      │
                        └──────┬───────┘
                               │
                        ┌──────▼───────┐
                        │   Router     │
                        │ (Classify)   │
                        └──┬───┬───┬───┘
                           │   │   │
                  ┌────────┘   │   └────────┐
                  │            │            │
           ┌──────▼──┐  ┌─────▼───┐  ┌─────▼────┐
           │ Handler  │  │ Handler │  │ Handler  │
           │    A     │  │    B    │  │    C     │
           │(General) │  │(Refund) │  │(Tech)   │
           └──────────┘  └─────────┘  └──────────┘

Routing allows separation of concerns. Instead of building one monolithic prompt that tries to handle everything (and does nothing particularly well), you build specialized handlers that excel at their specific task.

Basic Routing Implementation

# See code/routing.py for the full implementation

def route_query(llm, query, routes):
    """Classify a query and route it to the appropriate handler."""
    
    route_descriptions = "\n".join(
        f"- {name}: {desc}" for name, desc in routes.items()
    )
    
    classification = llm.generate(
        f"Classify the following query into exactly one category.\n\n"
        f"Categories:\n{route_descriptions}\n\n"
        f"Query: {query}\n\n"
        f"Respond with just the category name."
    )
    
    category = classification.strip()
    handler = handlers.get(category, handlers["default"])
    return handler(query)

Routing Strategies

LLM-Based Classification

The router uses an LLM to classify the input. This is the most flexible approach, capable of handling nuanced or ambiguous inputs:

# See code/routing.py for the full implementation

def llm_router(llm, query):
    """Use an LLM to classify and route queries."""
    result = llm.generate(
        f"Classify this customer query into one of these categories:\n"
        f"1. BILLING - questions about charges, invoices, payments\n"
        f"2. TECHNICAL - bugs, errors, how-to questions\n"
        f"3. SALES - pricing, plans, features\n"
        f"4. GENERAL - everything else\n\n"
        f"Query: {query}\n"
        f"Category:",
        temperature=0  # Deterministic for classification
    )
    return result.strip()

Keyword-Based Routing

For simpler cases, keyword matching can route inputs without an LLM call:

def keyword_router(query):
    """Route based on keywords - fast and cheap."""
    query_lower = query.lower()
    
    if any(w in query_lower for w in ["bill", "charge", "invoice", "payment"]):
        return "BILLING"
    elif any(w in query_lower for w in ["error", "bug", "crash", "broken"]):
        return "TECHNICAL"
    elif any(w in query_lower for w in ["price", "plan", "upgrade", "feature"]):
        return "SALES"
    else:
        return "GENERAL"

Embedding-Based Routing

Use vector similarity to find the closest route:

# See code/routing.py for the full implementation

class EmbeddingRouter:
    def __init__(self, routes, embedding_model):
        self.routes = routes
        self.model = embedding_model
        self.route_embeddings = {
            name: self.model.embed(desc)
            for name, desc in routes.items()
        }
    
    def route(self, query):
        query_embedding = self.model.embed(query)
        best_route = max(
            self.route_embeddings.items(),
            key=lambda x: cosine_similarity(query_embedding, x[1])
        )
        return best_route[0]

Model-Based Routing

Route to different LLM models based on task complexity — use cheaper models for simple queries and more capable models for hard ones:

# See code/routing.py for the full implementation

def model_router(llm_classifier, query):
    """Route to different models based on query complexity."""
    complexity = llm_classifier.generate(
        f"Rate the complexity of answering this query "
        f"(SIMPLE, MEDIUM, COMPLEX):\n{query}",
        temperature=0
    )
    
    model_map = {
        "SIMPLE": "claude-haiku",      # Fast, cheap
        "MEDIUM": "claude-sonnet",     # Balanced
        "COMPLEX": "claude-opus",      # Most capable
    }
    
    selected_model = model_map.get(complexity.strip(), "claude-sonnet")
    return selected_model

Multi-Level Routing

Complex systems may use hierarchical routing — a first router classifies at a high level, and sub-routers handle finer distinctions:

                     ┌─────────────┐
                     │  Top Router │
                     └──┬──────┬───┘
                        │      │
              ┌─────────▼┐   ┌▼──────────┐
              │  Support  │   │   Sales   │
              │  Router   │   │   Router  │
              └──┬────┬───┘   └──┬────┬───┘
                 │    │          │    │
              ┌──▼┐ ┌▼──┐    ┌──▼┐ ┌▼──┐
              │Bug│ │How│    │New│ │Up │
              │Fix│ │-To│    │Biz│ │grd│
              └───┘ └───┘    └───┘ └───┘
# See code/routing.py for the full implementation

def hierarchical_route(llm, query):
    """Two-level routing for complex classification."""
    
    # Level 1: High-level category
    category = llm.generate(
        f"Classify as SUPPORT or SALES:\n{query}",
        temperature=0
    ).strip()
    
    # Level 2: Sub-category
    if category == "SUPPORT":
        sub = llm.generate(
            f"Classify this support query as BUG, HOW_TO, or ACCOUNT:\n{query}",
            temperature=0
        ).strip()
    elif category == "SALES":
        sub = llm.generate(
            f"Classify this sales query as NEW_BUSINESS, UPGRADE, or RENEWAL:\n{query}",
            temperature=0
        ).strip()
    
    return category, sub

When to Use Routing

When to Avoid

Practical Tips

  1. Use low temperature for classification — You want deterministic, consistent routing
  2. Include a default/fallback route — Not every input will fit neatly into a category
  3. Monitor route distribution — Track which routes are being hit to identify imbalances or unexpected patterns
  4. Test edge cases — Ambiguous queries that could go to multiple routes are where routing fails
  5. Consider cost — An LLM classification call has a cost; for high-volume applications, consider keyword or embedding-based routing first
  6. Combine strategies — Use keyword routing for obvious cases and LLM routing for the rest

Navigation: