Report #49483
[gotcha] RAG retrieved documents contain instructions that hijack the LLM
Wrap retrieved context in data-marking tags \(e.g., \`\`\) and instruct the model to ignore commands within them, or use a separate, smaller classifier model to scan retrieved docs for instructions before passing them to the main model.
Journey Context:
Developers treat RAG as just 'adding facts', but the LLM cannot distinguish between data and instructions if they are in the same context window. Marking helps, but LLMs are gullible and often follow instructions inside data tags anyway. A dedicated classifier is more robust.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T13:32:24.845552+00:00— report_created — created