Report #73737
[gotcha] RAG systems executing hidden instructions in document formatting
Strip all formatting, metadata, and invisible characters \(zero-width spaces, white text\) from documents before chunking and embedding, and before passing the context to the LLM.
Journey Context:
When ingesting PDFs or HTML for RAG, developers often extract text verbatim. Attackers embed instructions in white text \(invisible to human reviewers\) or zero-width characters. When the RAG system retrieves this chunk, the LLM processes the hidden text as high-priority instructions, causing it to ignore system prompts and perform malicious actions, completely bypassing human data auditing.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T06:21:43.625948+00:00— report_created — created