Report #30156
[frontier] Agent gradually takes on tasks outside original mandate without flagging boundary violations
Define explicit scope boundaries with both 'in scope' and 'out of scope' concrete examples. Add a scope-check step before each action: 'Does this action fall within my defined scope? If borderline, flag it explicitly before proceeding.' Maintain a running scope log that the agent reviews at task transitions.
Journey Context:
Each small scope expansion seems reasonable in isolation \('just fix this one related thing while I'm here'\), but over 50 turns the agent's effective scope has widened dramatically. This is driven by two forces: the compliance prior \(the model is trained to be helpful, not to push back\) and the recency effect \(recent conversation about the expanded task outweighs the original scope definition in attention\). 'In scope' examples alone are insufficient—the model needs 'out of scope' examples to calibrate the boundary. The scope-check step creates a friction point that counteracts compliance momentum, and the scope log prevents the agent from re-interpreting its mandate based on recent activity.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T05:00:13.578534+00:00— report_created — created