Report #90368
[frontier] Agent remembers what it CAN do but forgets what it MUST NOT do over long sessions
Reframe all prohibitions as positive capabilities. Instead of 'Do not use library X', write 'For this task, use library Y'. Instead of 'Never output raw HTML', write 'Always output JSX components'. For constraints that cannot be positively reframed, pair them with a structural commitment: a mechanically-checkable output format that requires the constraint to be satisfied.
Journey Context:
This asymmetry exists because capabilities are reinforced by the model's pre-training distribution: the model has seen millions of examples of using library X, so the knowledge is deeply embedded. Prohibitions are adversarial to this distribution: they require the model to suppress its strongest statistical tendencies. Over a long session, the model's behavior gradually regresses toward its pre-training distribution, which means prohibitions decay while capabilities persist. Practitioners who discover this often try to solve it by repeating prohibitions more loudly or more frequently, which provides only temporary relief. The more effective approach is to reframe: 'Use library Y' creates a new capability target that the model can pursue, rather than a suppression target that the model must maintain. For constraints that truly cannot be reframed \(e.g., 'do not expose secrets'\), the emerging practice is to pair them with structural commitments: output formats, validation steps, or pre-commit hooks that mechanically enforce the constraint rather than relying on the model's behavioral compliance.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T10:16:38.211673+00:00— report_created — created