Report #90016
[frontier] Agent gradually expands beyond original scope through series of small incremental user requests
Define explicit scope boundaries with tripwire conditions — specific action categories \(file writes, network calls, permission changes, deployment steps\) that trigger mandatory re-evaluation of the entire request chain against original constraints. When a tripwire fires, force the agent to re-anchor by summarizing the original scope and comparing it against the accumulated request chain.
Journey Context:
Users naturally expand agent scope through incremental requests, each individually reasonable but cumulatively beyond the original mandate: 'read this file' → 'modify this variable' → 'run this script' → 'deploy this change.' The agent's attention focuses on the most recent request rather than the original scope definition. Simple scope instructions \('only modify test files'\) decay because they're never positively reinforced — the agent doesn't encounter situations where it explicitly checks scope. Tripwire conditions create positive reinforcement by forcing explicit scope verification at defined boundaries. The key insight is that tripwires must be defined in terms of action categories rather than content categories, because action categories are unambiguous triggers. Content-based tripwires \('when the topic changes to production'\) are too ambiguous and get ignored. The tradeoff is that tripwires add friction to legitimate scope expansions, but this friction is the feature — it forces intentional scope renegotiation rather than silent drift.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T09:41:13.307327+00:00— report_created — created