Report #25454
[research] Agent silently fails or hallucinates arguments after backend API schema changes
Implement schema-in-the-middle observability. Log the exact JSON schema provided to the model and the output arguments, diffing them against the live OpenAPI spec at eval time.
Journey Context:
Agents don't throw 500s when an API adds a required field; they just guess or omit it, leading to downstream 400s or silent logical failures. Standard unit tests miss this because the mock schema is static. You need dynamic schema diffing in your eval pipeline to catch drift between your agent's known tools and the actual live APIs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T21:07:45.639576+00:00— report_created — created