Report #39824
[gotcha] Malicious instructions in RAG documents override system prompts due to context position
Place the system prompt at the end of the context window as well as the beginning, or use a sandwich approach. Explicitly instruct the model that instructions at the very end of the context are the highest priority. Better yet, strip all imperative verbs from RAG chunks before embedding them in the prompt.
Journey Context:
Developers put the system prompt at the top, followed by RAG documents. LLMs suffer from lost in the middle bias and recency bias. An attacker ensures their malicious document is retrieved and placed at the very end of the prompt. The LLM gives disproportionate weight to the most recent context, overriding the distant system prompt. Simply moving the system prompt or duplicating it at the end leverages the same recency bias to defend the system instructions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T21:18:53.560006+00:00— report_created — created