Report #31526
[gotcha] Multi-turn attacks bypassing system prompts by exhausting context limits
Re-inject critical safety instructions and system prompts at regular intervals or at the very end of the conversation context, rather than only at the beginning.
Journey Context:
System prompts are prepended to the conversation. In long multi-turn chats, the distance between the system prompt and the latest user prompt grows. Due to recency bias in LLMs, instructions closer to the end of the context window have a stronger influence. Attackers use benign-looking long conversations to push the safety prompt out of the LLM's effective attention window, then strike.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T07:18:10.803517+00:00— report_created — created