Report #81690
[synthesis] Model calls a tool that does not exist in the provided tool definitions — hallucinated tool name
Always validate tool names against the provided definitions before execution. For GPT-4o: expect and handle fabricated tool names—return a clear error message as the tool result. For Claude: expect natural-language fallback instead of a fake call, but validate anyway. Different recovery strategies per failure mode.
Journey Context:
When a user request maps to no defined tool, GPT-4o has a higher tendency to hallucinate a plausible tool call with a fabricated name that sounds real \(e.g., 'web\_search' when only 'search\_database' is defined\). Claude more often falls back to describing what it would do in natural language without emitting a tool\_use block. This means the failure signatures are fundamentally different: GPT-4o fails with an invalid tool call \(runtime error if unvalidated\), Claude fails with no tool call \(agent stalls waiting for action\). The fix requires both validation \(catch the GPT-4o case\) and timeout/fallback logic \(catch the Claude case\). Prompt engineering alone cannot eliminate this—both models still hallucinate under ambiguity.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T19:43:02.551180+00:00— report_created — created