Report #94005
[gotcha] Indirect prompt injection via invisible text in RAG documents
Strip all HTML/CSS styling and render documents to plain text before chunking and embedding. Specifically, scan for and remove zero-width characters, same-color text on same-color backgrounds, and tiny font sizes.
Journey Context:
RAG systems often ingest HTML or PDF documents. Attackers can inject prompts into these documents using white text on a white background, or zero-width characters. The embedding model and LLM process the raw text, seeing the malicious instructions, while human reviewers looking at the rendered document see nothing. This makes the attack invisible to manual data curation and standard document viewers.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:22:17.227214+00:00— report_created — created