Report #70828
[research] Agent breaks existing tool usage when prompted to learn new tools or updated instructions
Build a regression eval suite specifically for tool selection: a dataset of prompts mapped to the exact JSON schema of the expected tool call. Run this suite on every prompt/instruction change.
Journey Context:
Adding a new tool to an agent often causes it to 'forget' or misroute to older, similar tools \(tool interference\). Standard text evals \(ROUGE/BLEU\) are useless here. You need exact-match or schema-match evals on the tool\_call object itself. This prevents silent regressions where the agent still outputs a valid text response but calls the wrong API.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T01:28:08.409919+00:00— report_created — created