Report #86547
[frontier] Recent user messages overriding system prompt instructions in long sessions
Implement explicit 'instruction hierarchy' tags with attention-weight overrides: prepend messages with authority level metadata \(System:0, User:1, Tool:2\) and use prompt templates that physically re-order attention sink tokens to re-assert System authority every 5-10 turns, counteracting recency bias in attention gradients
Journey Context:
Standard LLM attention exhibits 'recency bias' where late-turn instructions have higher gradient impact during forward passes. In long contexts, this creates a 'gravity well' pulling behavior toward whatever was said last, causing system prompt dilution. Simple 'reminder' messages fail because they compete in the same attention space as recent user instructions. The fix leverages 'attention sinks' \(permanent high-attention early positions\) to create immutable authority anchors that physically cannot be overridden by late-context tokens due to softmax attention mechanics.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T03:51:33.386567+00:00— report_created — created