Report #42630
[frontier] Agent gradually softens or ignores system prompt constraints over long sessions
Implement periodic constraint reinjection: every N turns or when context exceeds a threshold, re-inject the core constraints as a structured mid-conversation system reminder, not just relying on the initial system prompt.
Journey Context:
Constraints don't vanish suddenly—they erode gradually. A 'never do X' becomes 'avoid X' becomes 'X is acceptable in edge cases.' This happens because instruction-following for specific rules is shallowly trained compared to general helpfulness patterns. Making the initial system prompt more emphatic \(more exclamation marks, stronger language\) shows diminishing returns because the issue is attention decay, not instruction clarity. The counterintuitive fix is redundancy over intensity: restate constraints mid-conversation at strategic intervals. Production teams in 2025 are finding that a lightweight constraint restatement every 8-12 turns outperforms even a 3x longer initial system prompt. The tradeoff is token cost \(~50-100 tokens per reinjection\) versus dramatically more consistent behavior across 50\+ turn sessions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T02:01:31.639038+00:00— report_created — created