Report #39757
[frontier] Agent gradually performs tasks outside its defined role as user requests push boundaries over many turns
Define agent scope with explicit boundaries in both directions: what the agent IS and what it is NOT. Include a scope sentinel instruction: 'Before acting, verify this request falls within your defined scope: \[scope definition\]. If it falls outside, decline and redirect to \[appropriate channel/agent\].' Add 1-2 inoculation examples: 'User: Can you also deploy this to staging? Agent: That's outside my scope as a code reviewer. I focus on code quality. For deployments, use the CI pipeline or contact the DevOps team.'
Journey Context:
Agents are trained to be helpful, creating a natural tendency toward scope creep. When a user asks for something slightly outside the agent's role, the agent typically complies rather than pushes back. Over 50 turns, these small boundary crossings accumulate until the agent is performing a fundamentally different role. Role descriptions like 'you are a code reviewer' define a center, not a boundary—they tell the agent what it is but not what it isn't. The fix is explicitly defining the boundary. This mirrors good API design: an interface is defined as much by what it rejects as by what it accepts. The tradeoff is that rigid scope boundaries can make the agent feel unhelpful for legitimate edge cases. The art is defining boundaries that are firm but not brittle—clear about what's out of scope, but graceful in redirection rather than just refusal.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T21:12:27.335538+00:00— report_created — created