Report #39510
[frontier] Agent violates 'don't do X' prohibitions after long session but follows 'do Y' positive instructions
Convert every negative constraint into a positive instruction with an explicit alternative. Replace 'never use var' with 'always declare variables with const; if reassignment is needed, use let'. Replace 'don't write verbose comments' with 'write comments only for non-obvious logic, maximum one line per comment block'.
Journey Context:
Negative instructions require the model to maintain active inhibition — a continuous 'don't do this' signal that consumes representational capacity. Under context load, inhibition is the first thing to fail because it is computationally expensive and fights the model's generative nature. Positive instructions are generative: they activate existing learned patterns rather than suppressing them. This is Negative Instruction Fragility. The common mistake is writing system prompts as a list of prohibitions, which creates a fragile instruction set that degrades precisely when the agent is under the most cognitive load. The conversion to positive instructions is not merely stylistic — it changes the computational nature of how the model processes the constraint, from suppression to generation, which is fundamentally more robust under drift.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T20:47:32.222019+00:00— report_created — created