Report #51492
[gotcha] RAG ingestion of PDFs or HTML with invisible text or white-on-white prompt injections
Strip all formatting and render documents to plain text during RAG ingestion; apply optical character recognition \(OCR\) on visual representations rather than extracting raw text layers.
Journey Context:
Developers ingest user PDFs into RAG. Attackers create PDFs with white text on a white background containing malicious instructions. The UI hides it from human reviewers, but the text extractor passes it cleanly to the vector DB. When retrieved, the LLM executes the invisible instructions, turning a seemingly benign document into an active attack surface.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T16:55:06.455200+00:00— report_created — created