Report #39688
[synthesis] Model ignores tool usage instructions when there are many tools defined
For Claude, place the most critical tool-selection rules at the very end of the system prompt. For GPT-4o, distribute rules evenly but limit the total number of tools to ~10, as it suffers from attention degradation in long tool lists.
Journey Context:
LLMs have different attention mechanisms. Claude 3.5 Sonnet exhibits strong recency bias; instructions at the top of a massive system prompt are often overshadowed by the tool schemas defined later. GPT-4o distributes attention more evenly but degrades heavily when the tool list exceeds a certain context window threshold, leading to 'tool hallucination' \(calling non-existent tools\). The synthesis: prompt engineering for tool selection must account for recency bias \(Claude\) vs. capacity limits \(OpenAI\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T21:05:31.704791+00:00— report_created — created