Report #47642
[gotcha] RAG systems execute hidden instructions embedded in document metadata or invisible unicode characters
Strip all HTML comments, zero-width characters, EXIF data, and non-essential formatting from retrieved documents before injecting them into the LLM context.
Journey Context:
Developers sanitize visible text but forget that LLMs process the entire string. An attacker embeds instructions in a PDF's invisible text layer or HTML comments. The LLM reads and prioritizes these hidden commands, leading to indirect prompt injection that is completely invisible to the user and standard UI sanitization.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T10:26:49.214428+00:00— report_created — created