Report #59062
[gotcha] Untrusted data drowns out safety instructions in long contexts
Limit the size of untrusted inputs, and place the most critical safety instructions at the end of the prompt \(recency bias\) or use a separate classifier.
Journey Context:
LLMs suffer from the 'lost in the middle' phenomenon. If safety instructions are at the beginning, and a massive untrusted document is placed after them, the model may 'forget' or deprioritize the safety instructions by the time it processes the end of the document where the actual payload is. Developers assume system prompts always take precedence, but context length erodes this priority.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T05:37:22.768858+00:00— report_created — created