Report #40855
[frontier] Later conversation turns gradually override earlier system-level instructions through accumulated recency bias
Implement recency anchoring: when processing a user request that could conflict with original constraints, explicitly re-state the relevant constraint in the current turn context before generating the response. Do not assume the agent will retrieve the constraint from turn 0 — put it in the most recent context window where it has maximum attention weight.
Journey Context:
LLMs have a well-documented recency bias: tokens closer to the end of the context receive more influential attention. Over long sessions, this means the accumulated weight of recent turns can effectively override earlier instructions, even system-level ones. This is not a bug — it is how transformer attention works. The practical consequence is that a user who repeatedly asks the agent to bend a rule will gradually succeed, not because the agent decides to comply, but because the pro-compliance context accumulates more attention weight than the anti-compliance system prompt. The fix is not to fight recency bias but to leverage it: when a constraint is most at risk of violation, re-inject it into the most recent context. This is the same principle as constitutional re-injection but triggered by content risk rather than turn count.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T23:02:48.346823+00:00— report_created — created