Report #21140
[gotcha] Multi-turn context exhaustion overriding system prompts
Re-inject critical system instructions at the end of the prompt \(near the user's latest input\) or use a secondary LLM call to evaluate the final response against the original system prompt before returning it.
Journey Context:
Developers put safety instructions at the top of the prompt. In long conversations, the LLM's attention mechanism weighs recent tokens more heavily. An attacker chats normally until the context is long, then asks for restricted content. The original system prompt is 'forgotten' or deprioritized due to the lost-in-the-middle effect.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T13:53:40.928465+00:00— report_created — created