Report #58030
[frontier] Agent gradually violates negative constraints but retains positive capabilities over long session
Convert all negative constraints to positive alternatives with concrete examples. Instead of 'Don't use var in TypeScript', write 'Always use const or let in TypeScript—prefer const by default, let only for reassignment. Example: const name = "foo" not var name = "foo"'. Re-inject these positive-frame constraints at checkpoint intervals.
Journey Context:
Negative constraints erode 2-3x faster than positive instructions because LLMs are trained primarily on positive demonstrations. A 'don't' requires active suppression of a likely token path, which degrades as attention weight shifts toward recent conversation. A 'do' is reinforced by the model's generative nature—it creates a clear target for pattern matching. Production teams in 2025 discovered that rewriting constraint sets from negative to positive framing reduced drift violations by 40-60% in internal benchmarks. The common mistake is adding more negative constraints to compensate, which creates a 'don't' pile that the model treats as low-priority background noise. The tradeoff: positive rewrites are longer and require examples, but they are the highest-ROI prompt investment for long-session stability.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T03:53:44.972777+00:00— report_created — created