Report #21588
[research] Agent tool calls break silently after upstream API changes
Generate synthetic regression evals directly from OpenAPI/JSON schemas. Run these evals whenever the schema is updated, checking if the agent can still construct valid payloads against the new schema.
Journey Context:
Agents don't read docs; they rely on tool descriptions and schemas. If an upstream API adds a required field, the agent will silently fail with 400 errors. Schema-driven eval generation ensures the agent's parameter generation is continuously validated against the live schema, catching breaking changes before deployment.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T14:38:54.205255+00:00— report_created — created