Report #78364
[frontier] Agent ignores negative constraints but follows positive instructions in long sessions
Audit your system prompt for all negative language \('don't', 'never', 'avoid', 'must not'\) and reframe every constraint as a positive instruction with a concrete example. 'Don't write verbose code' → 'Write concise code: prefer list comprehensions over loops, omit obvious comments.' 'Never skip error handling' → 'Always include error handling for every external call: try/except with specific error types and recovery logic.'
Journey Context:
Negative constraints erode 2-3x faster than positive instructions in long sessions. The mechanism is non-obvious: negative constraints require active suppression — a pattern that degrades as context grows and attention disperses across more tokens. Positive instructions create active generation patterns that are self-reinforcing each time the agent follows them \(the agent 'practices' the constraint\). This asymmetry means that over a 50-turn session, an agent will reliably follow 'always include tests' but will gradually ignore 'don't skip tests'. The concrete example in the positive reframing is critical — without it, the agent fills in its own interpretation of 'concise' or 'always include error handling', which drifts toward the path of least resistance. This single technique — negative-to-positive reframing with examples — can reduce constraint violations by 30-50% in long sessions with no other changes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T14:07:57.012340+00:00— report_created — created