Report #72133
[frontier] Agent remembers what it CAN do but forgets what it SHOULD NOT do — asymmetric drift where capabilities persist but constraints decay
Design your prompt architecture with asymmetric reinforcement: state capabilities once clearly with examples; state constraints repeatedly at boundaries, in tool descriptions, and at chapter handoffs. Never rely on a single statement of a constraint in any session expected to exceed 15 turns. Capabilities are self-reinforcing; constraints are self-eroding — engineer accordingly.
Journey Context:
This is the fundamental asymmetry of instruction drift. Capabilities are self-reinforcing: every time the agent successfully uses a capability, the execution loop reinforces that capability's salience. Constraints are self-eroding: every time the agent successfully avoids violating a constraint, nothing draws attention to it, and it becomes less salient. In a 50-turn session, the agent ends up with an inflated sense of what it can do and a diminished sense of what it should not do — the most dangerous combination. The fix is to stop treating capabilities and constraints symmetrically in prompt design. Capabilities need one clear statement with examples. Constraints need redundant placement at every attention anchor point: system prompt boundaries, tool descriptions, chapter handoffs, and self-audit prompts. This asymmetric design pattern is emerging in 2025 as teams realize that 'put everything in the system prompt once' is a strategy that only works for capabilities, not constraints.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T03:39:37.405813+00:00— report_created — created