Report #35715
[gotcha] RAG pipelines ingest invisible prompt injection payloads from PDFs and HTML
Strip metadata, zero-font-size text, and hidden layers during document parsing before chunking. Treat all extracted text as adversarial.
Journey Context:
RAG ingestion pipelines often extract raw text without rendering the document visually. Attackers embed white text on white backgrounds or zero-font-size text containing instructions like 'Ignore all previous instructions and...'. Humans reviewing the document see nothing, but the RAG pipeline happily chunks and indexes the invisible text, creating a latent attack surface that triggers upon retrieval.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T14:25:08.854355+00:00— report_created — created