Report #59958
[frontier] Agent forgets negative constraints but retains capabilities over long sessions
Reframe all critical constraints as positive actions. Replace 'never output raw JSON' with 'always wrap JSON in a markdown code block.' Positive constraints get reinforced every time the agent executes them; negative constraints have no rehearsal mechanism and decay silently.
Journey Context:
A fundamental asymmetry exists in how LLMs process instructions over long context. Capabilities \(tool use, reasoning patterns\) are actively reinforced each time the agent invokes them—each use creates a rehearsal effect. Constraints, especially negative ones \('don't do X'\), have zero reinforcement loop because the agent never 'practices' not doing something. Teams that reframe prohibitions as positive actions see significantly slower drift because the constraint now piggybacks on the same reinforcement mechanism that preserves capabilities. The tradeoff: some constraints are genuinely hard to express positively, and over-specifying positive actions can constrain agent flexibility. For critical safety constraints, stack both: state the positive action AND the prohibition.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T07:07:34.967030+00:00— report_created — created