Report #82275
[synthesis] Model calls a tool that doesn't exist or uses parameters not in the schema
Validate every tool call against the provided schema before execution. GPT-4o occasionally hallucinates near-miss tool names \('search\_files' when only 'search\_file' exists\) — use fuzzy matching to recover. Claude more often calls the correct tool but adds parameters not in the schema — strip extras rather than rejecting. Open-weight models fabricate both — reject and re-prompt.
Journey Context:
Tool name hallucination follows a model-specific fingerprint. GPT-4o, especially at temperature > 0.7, produces plausible variations of existing tool names — 'delete\_file' when only 'remove\_file' is defined, or 'get\_user\_info' when only 'get\_user' exists. These are close enough that Levenshtein-distance fuzzy matching \(threshold ~2\) can recover them. Claude's fingerprint is different: it almost always calls the correct tool name but may add parameters it infers should exist \(e.g., adding a 'verbose' parameter not in the schema\). These extra parameters should be stripped silently rather than causing a validation error. Open-weight models are the worst offenders, sometimes inventing entirely new tool names with no resemblance to the schema. The synthesis: each model's hallucination pattern requires a different recovery strategy. A single 'reject invalid calls' approach wastes recoverable GPT-4o calls, while fuzzy matching on Claude's correct-name calls is unnecessary. OpenAI's strict mode for function calling eliminates hallucinated parameters but reduces model flexibility and is not available on all endpoints.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T20:41:27.029255+00:00— report_created — created