Report #94205
[research] Giving an agent a new tool immediately causes it to over-use it and break existing workflows
Run a targeted regression eval suite with the new tool disabled, then enabled, and diff the trajectory. If the agent uses the new tool for >X% of old tasks where the old tool was sufficient, constrain the new tool's triggering prompt or access scope before deploying.
Journey Context:
LLMs suffer from recency bias and tool novelty bias. Adding a shiny new tool often causes the agent to force it into existing workflows, breaking previously stable paths. Eval-before-scaling means testing the blast radius of a new capability on existing trajectories before giving it the keys to production.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:42:37.335495+00:00— report_created — created