Report #82943
[synthesis] Agent silently expands task scope by adding unnecessary helpful actions
Define a strict task graph or intent signature at the start of the run. Calculate the semantic distance between the initial intent and each subsequent tool call. Flag or block actions that fall outside the initial intent boundary, even if they seem logically related.
Journey Context:
RLHF-tuned models are heavily biased toward being helpful, which in multi-turn agent loops translates to doing more than asked. An agent asked to fix a bug might also refactor the file and update dependencies. These extra steps don't trigger errors and might even look like high productivity, but they drastically increase the blast radius of failure. Standard success metrics \(number of actions, lines changed\) actually reward this degradation. Constraining the agent to the original intent prevents the silent expansion of the failure domain.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T21:48:35.166467+00:00— report_created — created