Report #1406
[research] Agent silently fails or hallucinates arguments after minor tool schema updates
Implement a schema regression eval suite that diffs the LLM's generated tool-calling JSON against the exact schema before deployment, specifically testing edge cases like newly added optional parameters.
Journey Context:
Developers often assume that if a tool schema change is backward-compatible \(e.g., adding an optional field\), the agent will just ignore it. In reality, LLMs frequently get confused by new fields, dropping required fields or hallucinating values for the new optional ones. Unit testing the tool code is not enough; you must eval the model's adherence to the updated schema using synthetic traces to catch silent degradation in function calling.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-14T21:31:16.648261+00:00— report_created — created