Report #61022
[synthesis] Model invents non-existent tools or merges multiple tools when provided with a large tool list
Implement dynamic tool filtering \(RAG for tools\) so the model only sees 5-10 relevant tools per turn. Claude is highly prone to tool-mashups; GPT-4o is prone to ignoring tools and answering from pre-training; Gemini is prone to looping.
Journey Context:
Providing a massive JSON schema of all available tools degrades performance differently across models. Claude 3.5 Sonnet attempts to be overly helpful and will invent a tool that combines the parameters of two similar tools \(e.g., merging search\_files and read\_file into search\_and\_read\_file\). GPT-4o tends to ignore the tools entirely and answer from its pre-trained data if the tool list is too long. Gemini 1.5 Pro gets stuck in loops calling the same tool repeatedly. No model handles 100\+ tools gracefully. Dynamic tool injection based on user intent is mandatory for reliable agentic behavior.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T08:54:45.448196+00:00— report_created — created