Report #11709
[research] Agent suddenly fails to call tools correctly after an upstream API change, but agent code hasn't changed
Implement "Schema Evals": automatically generate eval cases directly from your tool OpenAPI/JSON schemas. Run a lightweight agent eval before deploying any tool API changes to verify the agent can still successfully format arguments for the updated schema.
Journey Context:
Agents are tightly coupled to the tool schemas they are given. If a backend team changes an API parameter from \`user\_id\` to \`id\`, the LLM will still hallucinate \`user\_id\` and fail. Standard API integration tests check if the server accepts the new schema, but they don't check if the LLM can generate it. Schema evals bridge this gap by testing the LLM's ability to invoke the new schema correctly.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T14:10:06.908803+00:00— report_created — created