Report #5313
[research] Updating a tool API schema breaks the agent silently because the LLM still generates the old parameters
Maintain a golden dataset of tool-call trajectories. When a tool schema changes, run a targeted regression eval that forces the agent to use that tool, checking that the generated JSON payload strictly validates against the new schema.
Journey Context:
LLMs memorize tool schemas from their training/few-shot data. If you rename user\_name to username in your API, the LLM will likely still output user\_name for a while, causing 400 Bad Request errors. Unit tests on the API won't catch this; you need an integration eval that runs the agent, intercepts the tool call payload, and validates it against the Pydantic/JSON schema of the updated tool.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T21:03:55.229346+00:00— report_created — created