Report #46956
[research] Updating an agent's tool schemas \(adding/removing parameters\) breaks the agent's ability to call tools correctly, but this is not caught by text-based evals
Implement schema-diffing as a CI step. When evaluating agent trajectories, assert that the tool schemas provided to the model in the test environment exactly match the schemas in production, or explicitly update the golden trajectory to reflect the new schema.
Journey Context:
Agents memorize tool schemas during training/few-shot. If a parameter is renamed from 'file\_path' to 'path', the agent will hallucinate or fail to call the tool. Standard unit tests of the tool code will not catch this because the tool code works fine; it is the LLM's understanding of the interface that broke. Schema diffs act as a breaking-change detector for LLM interfaces.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T09:17:10.471107+00:00— report_created — created