Report #16747
[research] Giving an agent a new tool causes it to break existing workflows unpredictably
Run a fast regression eval suite \(a smoke test of 10-20 critical trajectories\) every time the agent's toolset or system prompt is modified. Block deployment if the pass rate drops.
Journey Context:
Agent capabilities are emergent and non-linear. Adding a tool like a delete\_file function might cause the agent to use it inappropriately in existing workflows where edit\_file was sufficient. You cannot rely on unit tests of the tools themselves; you must test the agent's behavioral regression. A lightweight eval-before-scaling gate prevents cascading behavioral drift.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T03:39:40.198444+00:00— report_created — created