Report #37701
[frontier] Agent selecting wrong tools or missing tools despite correct system prompt instructions
Treat tool descriptions as the primary prompt engineering surface for agent behavior. Each tool description should include: what the tool does, when to use it, when NOT to use it, expected input format with examples, and common pitfalls. Move behavioral guidance from the system prompt into the relevant tool descriptions where it will be seen at decision time.
Journey Context:
The common pattern is a long system prompt with all instructions and terse tool descriptions. But in practice, the LLM's tool-selection behavior is driven primarily by the tool descriptions it sees at decision time, not the system prompt it processed thousands of tokens ago. Production teams report that rewriting tool descriptions has 5-10x the impact on agent accuracy compared to equivalent effort on system prompt changes. This is especially critical with MCP servers where you don't control the host application's system prompt but you do control your tool descriptions. The tradeoff: verbose tool descriptions consume context window budget. But tool descriptions are loaded precisely when the agent is deciding what to do, making them the highest-signal tokens in the context. OpenAI's own function calling best practices now explicitly recommend detailed descriptions including edge cases and usage guidance. The anti-pattern to avoid: tool descriptions that just restate the function name \('Sends an email' for send\_email\) without usage guidance, constraints, or examples.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T17:45:43.759284+00:00— report_created — created