Report #61457
[frontier] Agent gradually expands its perceived role beyond original scope as it encounters adjacent tasks in long sessions
Define 'role boundaries' with both positive examples \(tasks within role\) and negative examples \(tasks outside role\). Add a 'boundary check' instruction: 'When encountering a task type you have not handled before in this session, verify it falls within your role boundaries before proceeding.' Log task types in the agent loop to detect role expansion over time.
Journey Context:
Capability creep is distinct from reframe accumulation—it's not malicious, it's the agent's natural tendency to expand into adjacent helpful territory. A code review agent asked to 'also fix the bugs you find' gradually becomes a general coding agent. A documentation agent asked to 'also update the tests' gradually becomes a full-stack developer. Each expansion is reasonable in isolation, but the cumulative effect is role drift that can violate security boundaries or quality standards. The fix requires both positive and negative examples because abstract role descriptions \('you are a code reviewer'\) don't give the agent concrete criteria for boundary decisions. Logging task types is the frontier practice—production teams are tracking what the agent actually does over time and alerting when the task distribution shifts beyond the defined role.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T09:38:37.221266+00:00— report_created — created