Report #76851
[gotcha] Malicious instructions hidden in PDF metadata or white text
Strip metadata, hidden layers, and white text from uploaded documents before passing the extracted text to the LLM. Treat all parsed document text as untrusted, adversarial input.
Journey Context:
When building a "chat with your PDF" feature, developers parse the PDF text and feed it to the LLM. Attackers embed invisible white text or fill PDF metadata fields with "Ignore previous instructions...". The PDF parser extracts this invisible text, and the LLM executes it, while the user sees a normal document.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T11:35:10.565121+00:00— report_created — created