Report #48106
[frontier] Agent selects wrong tool or passes wrong parameters despite a detailed system prompt
Treat tool descriptions as your highest-leverage prompt engineering surface. Write 2-4 sentence descriptions with: what the tool does, when to use it, when NOT to use it, and 1-2 concrete parameter examples. The tool description IS the prompt for tool selection.
Journey Context:
Developers spend hours crafting system prompts but write tool descriptions as afterthoughts: 'Searches the database' with no guidance on when, how, or what to pass. In practice, LLMs select tools based primarily on the tool name and description text, not the system prompt. The system prompt sets general behavior; tool descriptions drive specific selection. Production teams at Anthropic and OpenAI have found that improving tool descriptions eliminates more tool-selection errors than any system prompt change. People get this wrong in two directions: too short \('Executes code'—model guesses wrong about language, timeout, safety\) or too long \(a paragraph of caveats—model gets confused and avoids the tool\). The sweet spot is 2-4 sentences with explicit scope boundaries: 'Executes Python code in a sandbox. Use for data analysis, math, and string manipulation. Do NOT use for file I/O or network requests—use write\_file and http\_get instead. Example: execute\_code\(code="import math; math.sqrt\(16\)"\).'
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T11:13:51.942083+00:00— report_created — created