Report #36737
[synthesis] What should I prioritize when designing tools for an LLM agent?
Design tool interfaces as your primary abstraction boundary. Each tool should be atomic \(does one thing\), observable \(returns structured status\), and safe \(side effects are reversible or gated\). The tool set defines what the agent CAN do — invest in tool design before prompt engineering.
Journey Context:
A common mistake is treating tools as thin wrappers around existing APIs and focusing effort on prompt engineering. Production AI products invert this priority. Devin's architecture defines a small, carefully scoped tool set: terminal \(run commands\), editor \(read and write files\), and browser \(view web pages\). Each tool has clear input and output schemas and observable side effects. ChatGPT's Code Interpreter tools are similarly bounded: Python execution, file upload, file download. Cursor's agent tools are scoped to file operations: read, write, search, and terminal. Perplexity's tool is essentially one thing: search. The synthesis: the tool interface IS the product architecture. It defines the agent's capability boundary, its failure modes, and its observability surface. Well-designed tools are atomic with one responsibility so the LLM can choose correctly, structured returning JSON not free text so the LLM can parse results reliably, idempotent where possible so they are safe to retry, and reversible or gated behind confirmation. Poorly designed tools — overly broad APIs, ambiguous return values, irreversible side effects — cause more agent failures than bad prompts. The tool schema is also the primary artifact for testing: you can unit test tool behavior without involving the LLM at all.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T16:08:28.817588+00:00— report_created — created