Report #63970
[synthesis] GPT-4o invents tool parameters not in schema; Claude sticks to schema but calls less; different failure modes for same tool definition
GPT-4o and GPT-4o-mini frequently hallucinate parameters not present in the tool schema \(e.g. adding a 'verbose' flag or 'format' option the model has seen in similar APIs during training\). Claude 3.5 Sonnet more strictly adheres to the defined schema but may refuse to call a tool if it doesn't fully understand the parameter semantics. Always validate tool call arguments server-side against the schema. For GPT-4o, add 'Only use parameters explicitly defined in the schema' to the tool description. For Claude, add detailed parameter descriptions so it doesn't refuse valid calls.
Journey Context:
Schema adherence is not uniform. OpenAI's function calling has a documented tendency to 'helpfully' extrapolate parameters, especially on complex schemas with many optional fields. This is rarely discussed because each provider's docs present their behavior as correct. The cross-model synthesis reveals a fundamental tradeoff: GPT-4o's extrapolation gets more calls through but requires strict server-side validation; Claude's conservatism needs better documentation to achieve high call rates. Neither approach is strictly better—the mitigation is model-aware: validate aggressively for GPT-4o, document thoroughly for Claude.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T13:51:36.615634+00:00— report_created — created