Chapter 4 – Tool Use Pattern

Extending the LLM Beyond Text Generation

An LLM without tools is like a brilliant consultant locked in a room with no phone, no computer, and no internet. They can reason and draft, but they cannot verify facts, run calculations, access current data, or take action in the world. Tool Use unlocks all of this.

Tool Use is the pattern in which an LLM is given descriptions of available functions and can request to call them during its reasoning process. The LLM generates a structured request (typically JSON), the runtime executes the function, and the result is fed back into the LLM’s context for further processing.

How Tool Use Works

The lifecycle of a tool call:

  ┌─────────┐    ┌──────────────┐    ┌──────────────┐    ┌─────────────┐
  │  User   │───►│  LLM reasons │───►│ LLM generates│───►│  Runtime    │
  │  Query  │    │  about task  │    │  tool call   │    │  executes   │
  └─────────┘    └──────────────┘    └──────────────┘    │  function   │
                                                          └──────┬──────┘
  ┌─────────┐    ┌──────────────┐                                │
  │  Final  │◄───│ LLM generates│◄───────────────────────────────┘
  │ Answer  │    │ using result │          (result returned)
  └─────────┘    └──────────────┘

Step 1: Tool Definition

Tools are described to the LLM using structured schemas. The quality of these descriptions directly affects how well the LLM uses the tools.

# See code/tool_use.py for the full implementation

tools = [
    {
        "name": "get_weather",
        "description": (
            "Get the current weather for a specific city. "
            "Returns temperature, conditions, and humidity. "
            "Use when the user asks about weather or needs "
            "weather data for planning."
        ),
        "parameters": {
            "type": "object",
            "properties": {
                "city": {
                    "type": "string",
                    "description": "City name, e.g., 'Paris' or 'New York'"
                },
                "units": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "Temperature unit preference"
                }
            },
            "required": ["city"]
        }
    }
]

Step 2: LLM Generates a Tool Call

Given the user query and tool descriptions, the LLM decides whether to call a tool and which one:

{
  "tool_calls": [
    {
      "name": "get_weather",
      "arguments": {
        "city": "Paris",
        "units": "celsius"
      }
    }
  ]
}

Step 3: Runtime Executes and Returns

The runtime calls the actual function and returns the result to the LLM:

{
  "role": "tool",
  "name": "get_weather",
  "content": "{\"temperature\": 18, \"conditions\": \"partly cloudy\", \"humidity\": 65}"
}

Step 4: LLM Generates Final Response

The LLM incorporates the tool result into its response:

“The current weather in Paris is 18°C with partly cloudy skies and 65% humidity.”

Categories of Tools

Information Retrieval Tools

Web search engines
Database queries
Document search / RAG
API calls (weather, stock prices, etc.)
Wikipedia, arXiv, and other knowledge bases

Computation Tools

Code execution (Python interpreter)
Calculator / math evaluation
Data transformation and analysis
Statistical computations

Action Tools

Sending emails or messages
Creating calendar events
Writing files
Making API calls that change state
Managing cloud resources

Perception Tools

Image analysis and description
Audio transcription
Document OCR
Video understanding

Managing Many Tools

As the number of available tools grows, a challenge emerges: the LLM’s context window fills up with tool descriptions, leaving less room for the actual task. Research from the Gorilla project (Patil et al., 2023) addresses this with tool retrieval — using the same technique as RAG but applied to tool descriptions:

Store tool descriptions in a vector database
When a user query arrives, retrieve the most relevant tools
Include only those tools in the LLM’s context

# See code/tool_use.py for the full implementation

class ToolRegistry:
    def __init__(self, all_tools):
        self.tools = {t["name"]: t for t in all_tools}
        self.index = VectorIndex()
        for tool in all_tools:
            self.index.add(tool["name"], tool["description"])
    
    def get_relevant_tools(self, query, k=5):
        """Retrieve the k most relevant tools for a query."""
        relevant_names = self.index.search(query, k=k)
        return [self.tools[name] for name in relevant_names]

Composing Tools

One of the key advantages of the CodeAct approach (using Python code as the action format) is that it allows natural tool composition — calling multiple tools in sequence, using the output of one as the input to another:

# Agent-generated code action (CodeAct style)
weather = get_weather("Paris")
if weather["temperature"] > 25:
    activities = search_activities("Paris", "outdoor")
else:
    activities = search_activities("Paris", "indoor")

calendar = get_calendar("today")
available_slots = find_free_slots(calendar)
recommendation = f"Given the {weather['conditions']} weather at {weather['temperature']}°C, "
recommendation += f"I suggest: {activities[0]['name']} at {available_slots[0]}"

With JSON tool calls, this same logic would require multiple round trips to the LLM — one for each tool call and decision point.

Error Handling in Tool Use

Tools fail. APIs go down, rate limits kick in, inputs are invalid. Robust tool-using agents need strategies for handling failures:

# See code/tool_use.py for the full implementation

def execute_tool_safely(tool_call, tools):
    tool_fn = tools.get(tool_call.name)
    
    if tool_fn is None:
        return {
            "error": f"Unknown tool: {tool_call.name}",
            "available_tools": list(tools.keys())
        }
    
    try:
        result = tool_fn(**tool_call.arguments)
        return {"success": True, "result": result}
    except ValidationError as e:
        return {"error": f"Invalid arguments: {e}"}
    except RateLimitError:
        return {"error": "Rate limited. Try again in a few seconds."}
    except Exception as e:
        return {"error": f"Tool execution failed: {type(e).__name__}: {e}"}

Returning structured error messages back to the LLM allows it to recover — it might retry with corrected arguments, switch to an alternative tool, or inform the user about the limitation.

When to Use Tool Use

Tool Use is one of the two most mature and reliable agentic patterns (alongside Reflection). Use it when:

The task requires current information (web search, live data)
The task requires precise computation (math, code execution)
The task requires taking actions (sending emails, creating records)
The LLM’s training data is insufficient or outdated
You need verifiable results rather than generated text

Tool Design Best Practices

Anthropic emphasizes that investing in Agent-Computer Interface (ACI) design is just as important as prompt engineering. Their recommendations:

Make tools obvious: If a developer would need to think carefully about how to use it, the LLM will struggle too
Include examples: Show the LLM how the tool is used, including edge cases
Use clear parameter names: city_name is better than loc
Validate inputs: Use schemas to constrain what the LLM can pass
Use absolute paths: When tools deal with files, require absolute paths to avoid ambiguity
Test extensively: Run many examples to see where the LLM makes mistakes, then iterate on the tool description

“While building our agent for SWE-bench, we actually spent more time optimizing our tools than the overall prompt.” — Anthropic

Navigation:

Previous: Chapter 3 – Reflection Pattern
Next: Chapter 5 – Planning Pattern
Table of Contents