Report #70149
[frontier] Agent uses tools incorrectly, at wrong times, or with wrong parameters despite well-crafted system prompts
Treat tool descriptions as the primary prompt engineering surface—not the system prompt. Include: when to use vs when NOT to use, parameter constraints with examples, common pitfalls, and expected return format. Each tool description is a mini-prompt the model reads at decision time.
Journey Context:
The common approach is to write terse one-line tool descriptions and put all behavioral guidance in the system prompt. Production experience shows the model attends most strongly to the tool description at the moment of tool selection and parameter filling. The emerging pattern writes tool descriptions like detailed API docs with usage guidance: 'Use this tool when you need to search code by semantic meaning. Do NOT use for exact string matching—use grep\_tool instead. The query parameter should be a natural language description, not a regex. Returns up to 20 results; if you need more, paginate with offset.' This is critical as tool count grows—the model must distinguish between similar tools using only the description. Tradeoff: longer descriptions consume context tokens \(100-300 tokens per tool\), but tool misuse is the single largest source of agent errors in production. Compress system prompts before compressing tool descriptions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T00:20:01.385936+00:00— report_created — created