This chapter covers advanced MCP features: OAuth authentication for remote servers, sampling (servers requesting LLM completions), robust error handling, logging, and production deployment patterns.
Remote MCP servers (HTTP/SSE) are accessible over the network, so they need authentication. The MCP specification supports OAuth 2.0 for this purpose.
The Python SDK includes OAuth middleware for FastMCP servers:
from mcp.server.fastmcp import FastMCP
from mcp.server.auth import OAuthMiddleware
mcp = FastMCP("secure-server")
# Attach OAuth middleware
app = mcp.sse_app()
app = OAuthMiddleware(app, config={
"authorization_server": "https://auth.example.com",
"client_id": "my-mcp-server",
"required_scopes": ["mcp:read", "mcp:write"],
})
When a client connects to an authenticated server, the server returns a 401 response with OAuth metadata. The host application (Claude Desktop, Cursor, etc.) handles the OAuth flow — presenting a login page, obtaining a token, and attaching it to subsequent requests.
For custom clients, implement the OAuth 2.0 PKCE flow to obtain and refresh tokens.
For internal tools, a simple API key approach is often sufficient:
from starlette.middleware.base import BaseHTTPMiddleware
from starlette.responses import Response
class APIKeyMiddleware(BaseHTTPMiddleware):
def __init__(self, app, api_key: str):
super().__init__(app)
self.api_key = api_key
async def dispatch(self, request, call_next):
key = request.headers.get("x-api-key")
if key != self.api_key:
return Response("Unauthorized", status_code=401)
return await call_next(request)
Sampling is an advanced MCP feature that allows a server to request a completion from the LLM that is connected to the client. This enables powerful patterns:
The server sends a sampling/createMessage request to the client. The client passes it to the host, which passes it to the LLM. The result flows back to the server.
from mcp.server.fastmcp import FastMCP
from mcp.types import SamplingMessage, TextContent
mcp = FastMCP("sampling-demo")
@mcp.tool()
async def classify_and_route(text: str) -> str:
"""Classify text and route it to the right handler."""
# Ask the connected LLM to classify the text
result = await mcp.get_context().session.create_message(
messages=[
SamplingMessage(
role="user",
content=TextContent(
type="text",
text=f"Classify this as 'bug', 'feature', or 'question': {text}",
),
)
],
max_tokens=10,
)
category = result.content.text.strip().lower()
return handle_category(category, text)
Important: Sampling requires the host to support it (not all clients do). Check session.capabilities.sampling before using it.
Inside a tool handler, you can access the current MCP context for advanced operations:
from mcp.server.fastmcp import FastMCP, Context
mcp = FastMCP("context-demo")
@mcp.tool()
async def long_running_tool(ctx: Context, data: str) -> str:
"""A tool that reports progress."""
await ctx.report_progress(0, 100, "Starting...")
result = process_step_1(data)
await ctx.report_progress(50, 100, "Halfway done...")
result = process_step_2(result)
await ctx.report_progress(100, 100, "Complete")
return result
The Context object provides:
ctx.report_progress(current, total, message) — send progress notificationsctx.log(level, message) — send log messages to the clientctx.session — access to the underlying MCP sessionTo receive Context, add it as the first parameter of your tool function.
MCP has two error channels:
McpError with an error code.from mcp.types import McpError, ErrorCode
@mcp.tool()
def get_record(record_id: int) -> str:
if record_id <= 0:
# Protocol-level error: invalid input
raise McpError(ErrorCode.INVALID_PARAMS, "record_id must be positive")
record = db.find(record_id)
if record is None:
# Tool-level error: operation failed, but the call was valid
return f"Error: no record with ID {record_id}"
return record.to_json()
For tools that call external services, always set timeouts:
import httpx
@mcp.tool()
async def call_api(endpoint: str) -> str:
"""Call an external API with a 10-second timeout."""
try:
async with httpx.AsyncClient(timeout=10.0) as client:
response = await client.get(endpoint)
response.raise_for_status()
return response.text
except httpx.TimeoutException:
return "Error: request timed out after 10 seconds"
except httpx.HTTPStatusError as e:
return f"Error: HTTP {e.response.status_code}"
Use Python’s logging module with structured output:
import logging, sys, json
class JSONFormatter(logging.Formatter):
def format(self, record):
return json.dumps({
"ts": self.formatTime(record),
"level": record.levelname,
"msg": record.getMessage(),
"tool": getattr(record, "tool", None),
})
handler = logging.StreamHandler(sys.stderr)
handler.setFormatter(JSONFormatter())
logging.getLogger().addHandler(handler)
Use ctx.log() to send log messages that appear in the host application:
@mcp.tool()
async def process(ctx: Context, data: str) -> str:
await ctx.log("info", f"Processing {len(data)} bytes of data")
result = do_work(data)
await ctx.log("info", "Processing complete")
return result
Containerize your server for consistent deployment:
FROM python:3.11-slim
WORKDIR /app
COPY pyproject.toml .
RUN pip install .
COPY src/ src/
EXPOSE 8000
CMD ["python", "-m", "my_server", "--transport", "sse"]
For HTTP servers, add a health endpoint:
from starlette.routing import Route
from starlette.responses import JSONResponse
async def health(request):
return JSONResponse({"status": "ok", "server": "my-mcp-server"})
# Add to your app's routes alongside the SSE endpoint
For public-facing servers, add rate limiting per client:
from starlette.middleware.base import BaseHTTPMiddleware
from collections import defaultdict
import time
class RateLimitMiddleware(BaseHTTPMiddleware):
def __init__(self, app, max_requests: int = 60, window: int = 60):
super().__init__(app)
self.max_requests = max_requests
self.window = window
self.requests = defaultdict(list)
async def dispatch(self, request, call_next):
client_ip = request.client.host
now = time.time()
self.requests[client_ip] = [
t for t in self.requests[client_ip] if now - t < self.window
]
if len(self.requests[client_ip]) >= self.max_requests:
return Response("Too Many Requests", status_code=429)
self.requests[client_ip].append(now)
return await call_next(request)
Context to report progress and send log messages to the clientMcpError) from tool errors (return error text)| ← Chapter 9: Connecting to Other Clients | Table of Contents | Chapter 11: Action Plan → |