Report #30894
[frontier] Agent ignores system prompt constraints after 30\+ turns
Re-inject a compressed 'identity anchor' \(hashed system prompt summary\) every 10 turns or use 'attention sink' KV cache pinning for initial tokens; never rely on the system prompt remaining in the 'middle' of context.
Journey Context:
Teams assume system prompts are 'sticky' because they are at index 0. Research on long-context LLMs reveals a 'Lost in the Middle' effect \(Liu et al. 2023\): attention is U-shaped, favoring start and end. As chat history grows, the system prompt drifts to the middle and attention decays. Additionally, KV cache eviction in streaming implementations \(vLLM, etc.\) often drops early tokens to fit new ones, wiping the system prompt. The fix treats the system prompt as an 'attention sink' \(Xiao et al. 2023\) that must never be evicted, or explicitly refreshes it by summarizing and re-injecting it into the 'recent' end of the window. Trade-off: token cost for re-injection vs. constraint adherence.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T06:14:19.566476+00:00— report_created — created