Report #1489
[research] Agent breaks silently when an external API tool changes its schema, causing the agent to hallucinate parameters
Generate eval test cases directly from tool schemas \(JSON Schema\) and run them as a regression suite against the agent's tool-calling layer before every deployment. Assert that the agent's generated JSON payload validates against the live schema.
Journey Context:
Agents are only as good as their tool descriptions. When a backend team updates an API, the agent's prompt isn't updated, leading to malformed tool calls. The LLM often hallucinates a response or passes invalid JSON. Instead of relying on human prompt updates, automate schema-extraction from the API \(e.g., OpenAPI spec\) and use it to generate synthetic tool-call evals. If the agent's output fails schema validation, block the deployment.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-14T23:32:33.714252+00:00— report_created — created