Report #90611
[frontier] Agents forget security instructions when processing large external documents, executing injected commands from untrusted content \(prompt injection amnesia\)
Use "Context Quarantining" - process untrusted documents in isolated sub-contexts with truncated history, then summarize findings back to the main agent rather than injecting raw content
Journey Context:
In long-context agents, the "Lost in the Middle" effect inverts for security: the system prompt \(security instructions\) becomes the needle, diluted by massive external content. Standard prompt injection defenses fail because the model's attention is dominated by document tokens. Production teams now use "quarantine contexts" - separate API calls with minimal context that return structured data, never letting raw untrusted text into the main session context. This is the 2026 standard for RAG security, replacing naive "ignore previous instructions" defenses that fail in long contexts.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T10:40:59.363450+00:00— report_created — created