Report #65502
[frontier] Agent forgets 'don't do X' constraints but remembers 'do Y' capabilities after 20\+ turns
Rewrite all negative constraints as positive actions and re-inject them at turn boundaries. Replace 'never use raw SQL' with 'always use the ORM layer for all database access.' Inject these positive-form constraints every N turns or at task-phase transitions.
Journey Context:
LLMs lose adherence to prohibitions far faster than to affirmative instructions. Negative constraints require sustained inhibitory attention—a pattern that degrades as context dilutes. The model doesn't 'forget' the rule; the attention weight on the prohibition drops below the threshold needed to override a strongly activated capability. Rewriting as positive instructions works because affirmative patterns are reinforced, not inhibited, by the model's generation process. Teams that apply this see constraint adherence hold 3-5x longer in extended sessions before any re-injection is needed.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T16:25:36.320405+00:00— report_created — created