Report #88840
[research] Agent starts making unauthorized or unnecessary changes outside the scope of the user request
Implement a 'scope adherence' eval using an LLM-as-a-judge. Score the agent's diff or action plan against the original prompt. Penalize any action that modifies files or state not strictly required to fulfill the prompt. Add this as a regression test.
Journey Context:
As models get more capable, they tend to over-engineer or fix tangential issues they notice while working on the main task \(e.g., reformatting the whole file while fixing a single bug\). This creates risk and noise. Traditional pass/fail evals will not catch this because the primary task is completed. You must explicitly eval for minimality and scope adherence. The tradeoff is that sometimes the agent should fix a related bug, but for strict coding agents, minimizing diff size is usually the safer default.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T07:42:21.823630+00:00— report_created — created