Chapter 10: Advanced Topics


This chapter covers advanced MCP features: OAuth authentication for remote servers, sampling (servers requesting LLM completions), robust error handling, logging, and production deployment patterns.


OAuth 2.0 Authentication

Remote MCP servers (HTTP/SSE) are accessible over the network, so they need authentication. The MCP specification supports OAuth 2.0 for this purpose.

Server-Side: Adding Authentication

The Python SDK includes OAuth middleware for FastMCP servers:

from mcp.server.fastmcp import FastMCP
from mcp.server.auth import OAuthMiddleware

mcp = FastMCP("secure-server")

# Attach OAuth middleware
app = mcp.sse_app()
app = OAuthMiddleware(app, config={
    "authorization_server": "https://auth.example.com",
    "client_id": "my-mcp-server",
    "required_scopes": ["mcp:read", "mcp:write"],
})

Client-Side: Handling Auth

When a client connects to an authenticated server, the server returns a 401 response with OAuth metadata. The host application (Claude Desktop, Cursor, etc.) handles the OAuth flow — presenting a login page, obtaining a token, and attaching it to subsequent requests.

For custom clients, implement the OAuth 2.0 PKCE flow to obtain and refresh tokens.

API Key Authentication (Simple Alternative)

For internal tools, a simple API key approach is often sufficient:

from starlette.middleware.base import BaseHTTPMiddleware
from starlette.responses import Response

class APIKeyMiddleware(BaseHTTPMiddleware):
    def __init__(self, app, api_key: str):
        super().__init__(app)
        self.api_key = api_key

    async def dispatch(self, request, call_next):
        key = request.headers.get("x-api-key")
        if key != self.api_key:
            return Response("Unauthorized", status_code=401)
        return await call_next(request)

Sampling: Servers Requesting LLM Completions

Sampling is an advanced MCP feature that allows a server to request a completion from the LLM that is connected to the client. This enables powerful patterns:

How Sampling Works

The server sends a sampling/createMessage request to the client. The client passes it to the host, which passes it to the LLM. The result flows back to the server.

from mcp.server.fastmcp import FastMCP
from mcp.types import SamplingMessage, TextContent

mcp = FastMCP("sampling-demo")

@mcp.tool()
async def classify_and_route(text: str) -> str:
    """Classify text and route it to the right handler."""
    # Ask the connected LLM to classify the text
    result = await mcp.get_context().session.create_message(
        messages=[
            SamplingMessage(
                role="user",
                content=TextContent(
                    type="text",
                    text=f"Classify this as 'bug', 'feature', or 'question': {text}",
                ),
            )
        ],
        max_tokens=10,
    )
    category = result.content.text.strip().lower()
    return handle_category(category, text)

Important: Sampling requires the host to support it (not all clients do). Check session.capabilities.sampling before using it.


Context and Session Access

Inside a tool handler, you can access the current MCP context for advanced operations:

from mcp.server.fastmcp import FastMCP, Context

mcp = FastMCP("context-demo")

@mcp.tool()
async def long_running_tool(ctx: Context, data: str) -> str:
    """A tool that reports progress."""
    await ctx.report_progress(0, 100, "Starting...")

    result = process_step_1(data)
    await ctx.report_progress(50, 100, "Halfway done...")

    result = process_step_2(result)
    await ctx.report_progress(100, 100, "Complete")

    return result

The Context object provides:

To receive Context, add it as the first parameter of your tool function.


Robust Error Handling

Distinguishing Error Types

MCP has two error channels:

  1. Protocol errors — the request itself is malformed. Raise McpError with an error code.
  2. Tool errors — the tool ran but the operation failed. Return an error in the content.
from mcp.types import McpError, ErrorCode

@mcp.tool()
def get_record(record_id: int) -> str:
    if record_id <= 0:
        # Protocol-level error: invalid input
        raise McpError(ErrorCode.INVALID_PARAMS, "record_id must be positive")

    record = db.find(record_id)
    if record is None:
        # Tool-level error: operation failed, but the call was valid
        return f"Error: no record with ID {record_id}"

    return record.to_json()

Handling Timeouts

For tools that call external services, always set timeouts:

import httpx

@mcp.tool()
async def call_api(endpoint: str) -> str:
    """Call an external API with a 10-second timeout."""
    try:
        async with httpx.AsyncClient(timeout=10.0) as client:
            response = await client.get(endpoint)
            response.raise_for_status()
            return response.text
    except httpx.TimeoutException:
        return "Error: request timed out after 10 seconds"
    except httpx.HTTPStatusError as e:
        return f"Error: HTTP {e.response.status_code}"

Logging and Observability

Structured Logging

Use Python’s logging module with structured output:

import logging, sys, json

class JSONFormatter(logging.Formatter):
    def format(self, record):
        return json.dumps({
            "ts": self.formatTime(record),
            "level": record.levelname,
            "msg": record.getMessage(),
            "tool": getattr(record, "tool", None),
        })

handler = logging.StreamHandler(sys.stderr)
handler.setFormatter(JSONFormatter())
logging.getLogger().addHandler(handler)

Sending Logs to the Client

Use ctx.log() to send log messages that appear in the host application:

@mcp.tool()
async def process(ctx: Context, data: str) -> str:
    await ctx.log("info", f"Processing {len(data)} bytes of data")
    result = do_work(data)
    await ctx.log("info", "Processing complete")
    return result

Production Deployment Patterns

Docker

Containerize your server for consistent deployment:

FROM python:3.11-slim
WORKDIR /app
COPY pyproject.toml .
RUN pip install .
COPY src/ src/
EXPOSE 8000
CMD ["python", "-m", "my_server", "--transport", "sse"]

Health Checks

For HTTP servers, add a health endpoint:

from starlette.routing import Route
from starlette.responses import JSONResponse

async def health(request):
    return JSONResponse({"status": "ok", "server": "my-mcp-server"})

# Add to your app's routes alongside the SSE endpoint

Rate Limiting

For public-facing servers, add rate limiting per client:

from starlette.middleware.base import BaseHTTPMiddleware
from collections import defaultdict
import time

class RateLimitMiddleware(BaseHTTPMiddleware):
    def __init__(self, app, max_requests: int = 60, window: int = 60):
        super().__init__(app)
        self.max_requests = max_requests
        self.window = window
        self.requests = defaultdict(list)

    async def dispatch(self, request, call_next):
        client_ip = request.client.host
        now = time.time()
        self.requests[client_ip] = [
            t for t in self.requests[client_ip] if now - t < self.window
        ]
        if len(self.requests[client_ip]) >= self.max_requests:
            return Response("Too Many Requests", status_code=429)
        self.requests[client_ip].append(now)
        return await call_next(request)

Key Takeaways


← Chapter 9: Connecting to Other Clients Table of Contents Chapter 11: Action Plan →