Report #45354
[frontier] Negative constraints like never do X drift faster than positive ones in long sessions
Reformulate constraints as positive imperatives wherever possible: 'always use numbered paragraphs' instead of 'never use bullet points'; 'respond in under 200 words' instead of 'do not be verbose'; 'use only documented APIs from the provided list' instead of 'never hallucinate APIs.' Positive constraints self-reinforce through successful execution; negative constraints have no reinforcing signal and decay faster.
Journey Context:
This insight originates in behavioral psychology and is being validated empirically in LLM contexts throughout 2025. A negative constraint like 'do not do X' requires maintaining an absence—there is no positive feedback when the agent successfully avoids something. A positive constraint like 'always do Y' is reinforced every time the agent successfully does Y, creating a self-strengthening behavioral loop. In long sessions this difference compounds: positive constraints get stronger through repetition while negative constraints get weaker through inattention. The practical implication is that system prompts should be audited for negative formulations and rewritten as positive imperatives wherever semantically possible. Some constraints are inherently negative—safety boundaries, legal prohibitions—and these should be classified as P0 in the constraint primacy stack with the most aggressive re-injection strategy to compensate for their inherent drift susceptibility.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:35:52.416143+00:00— report_created — created