Report #80332
[synthesis] Agent suddenly violates system prompt negative constraints in long conversations without code changes
Instrument 'constraint probes' at the end of long conversations—inject a synthetic check or use a separate lightweight LLM to evaluate compliance with negative constraints specifically when token count exceeds 50% of the context window.
Journey Context:
Teams assume system prompts are absolute. In reality, LLMs exhibit 'lost in the middle' and recency bias. As the conversation history grows, the relative attention paid to the system prompt degrades. The agent doesn't error out; it just smoothly starts ignoring 'do not do X' instructions. This is often misdiagnosed as a model provider regression because it correlates with user session length, not deployment time.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T17:26:45.968862+00:00— report_created — created