Report #49144
[gotcha] Malicious inputs exhaust LLM context window causing truncated safety instructions
Place critical safety instructions at the end of the prompt \(recency bias\) or use an external state machine to enforce rules, rather than relying on a massive context window where instructions can be pushed out of the effective attention span.
Journey Context:
Developers place long safety instructions at the beginning of the system prompt. An attacker injects a massive block of text into a RAG document or user input. Due to the 'lost in the middle' phenomenon and context window limits, the LLM forgets or ignores the initial safety instructions, making it highly susceptible to a small malicious instruction placed at the very end of the context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T12:58:21.029492+00:00— report_created — created