Tool Use and Integrations

How an agent acts on the world, and why tool design is a context budget problem.

Tools are a contract, not a feature list

Every tool definition occupies space in the context window and shapes how the model decides to act. A tool is not just a capability you're exposing — it's a contract between the agent and the world, and like any API contract, ambiguity in that contract produces unreliable behavior. The discipline of tool design matters as much as the discipline of prompt design.

The minimal viable toolset

The most common failure mode in tool-using agents is a bloated toolset: too many tools, overlapping responsibilities, unclear boundaries between when to use one versus another. A useful test before adding a tool: if a competent human engineer, looking at the current situation, couldn't confidently say which tool applies, the agent will not reliably do better. It's almost always better to maintain a small number of clearly-scoped, non-overlapping tools than a large number of tools covering every conceivable variation.

Hierarchical action space

Rather than building a dedicated tool for every possible operation, expose a small set of core, atomic tools for the most common and highest-value operations, and handle long-tail or complex operations through a more general-purpose execution tool — a sandboxed shell or code-execution tool, for instance — that can compose primitive operations as needed. This avoids the combinatorial explosion of tool definitions that comes from trying to anticipate every use case in advance.

MCP as a standardization layer

The Model Context Protocol (MCP) standardizes how an agent discovers and calls tools exposed by external servers — calendars, ticketing systems, internal databases, third-party APIs — without each integration requiring bespoke glue code. This matters architecturally because it decouples "what tools exist" from "how the agent's core loop works": new integrations can be added by connecting a new MCP server rather than modifying the agent's reasoning logic.

Designing tool input parameters

Ambiguous or overly flexible input parameters are a frequent source of agent errors — not because the model can't follow instructions, but because the parameter schema itself didn't constrain the input enough to prevent a plausible-but-wrong choice. Favor explicit enums over free-text strings where the valid options are known in advance, validate inputs at the tool boundary rather than trusting the model to always supply well-formed arguments, and write parameter descriptions that play to how the model actually reasons about the task, not just a technically accurate API reference.

MCP tool schema — enum vs free-text placeholder
{
  "strict_tool": {
    "name": "update_ticket_status",
    "parameters": {
      "ticket_id": { "type": "string" },
      "status": {
        "type": "string",
        "enum": ["open", "in_progress", "resolved", "closed"]
      }
    }
  },
  "loose_tool": {
    "name": "update_ticket_status",
    "parameters": {
      "ticket_id": { "type": "string" },
      "status": { "type": "string", "description": "Any valid status" }
    }
  }
}

Part II — MCP anatomy

The Model Context Protocol defines a client/server contract between an AI host and external capabilities. The host runs the agent loop; the client maintains connections to one or more servers that expose tools (actions), resources (readable data), and prompts (templated workflows). This separation means integration teams ship MCP servers independently of the agent's core reasoning code — the USB-C analogy from the official docs is apt.

Each server advertises a capability manifest; the host injects only the schemas the agent is allowed to see for the current task, supporting dynamic tool loading and permission scoping.

Part II — Tool schema minimalism and overlap audits

Before adding a tool, run an overlap audit: list existing tools, describe each in one sentence, and mark pairs where a competent engineer could not instantly decide which applies. Merge or split until boundaries are crisp. Prefer enums over free-text for any parameter with a known finite set — the model will guess plausible wrong values on free-text fields.

Parameter descriptions should explain when to use the tool in task terms, not just REST API accuracy. "Use when the user wants to change ticket status" beats "Updates a ticket object."

Part II — Code execution pattern

Anthropic's engineering article on MCP code execution shows a critical optimization: instead of streaming massive tool results into context, the agent writes a short script that runs in an isolated sandbox, processes data locally, and returns a compact summary. A 10,000-line API response becomes fifty tokens of structured output. This is both a latency win and a context hygiene win.

The harness must enforce sandbox boundaries — filesystem scope, network allowlists, execution timeouts — independent of model compliance.

Part II — Security boundary for third-party servers

Treat every MCP server like a dependency with supply-chain risk. Vet sources, pin versions, run servers with least-privilege credentials, and never let a server expose tools outside its declared scope. Prompt-level "do not exfiltrate data" is insufficient; network and credential isolation is the enforcement layer.

Case study: An agent connected to a community MCP calendar server and a internal CRM server. A malicious calendar event description instructed the agent to export CRM contacts. Fix: separate MCP clients with credential scoping, block cross-server tool chains without human approval, and log all tool parameters to observability.

Compact execution result — sandbox pattern
{
  "tool": "execute_code",
  "input": {
    "language": "python",
    "script": "import json\nrows = fetch_crm('contacts', limit=500)\nprint(json.dumps({'count': len(rows), 'domains': top_domains(rows)}))"
  },
  "sandbox": { "network": "crm_api_only", "timeout_ms": 10000 },
  "context_injection": {
    "mode": "summary_only",
    "max_tokens": 120
  }
}

Further reading

How the agent acts in the world. The major recent revolution here was the standardization of integrations to avoid the "API schema nightmare" coupled with the prompt.

  • Model Context Protocol (MCP) by Anthropic — Launched in late 2024, MCP is already the gold standard in the market for connecting agents to local repositories, databases, and external tools with a client/server architecture, isolating tool schema design from the LLM's logic.
  • Code execution with MCP (Anthropic Engineering) — A brilliant article on how delegating code execution to isolated environments reduces latency and token consumption in tool results.