Report #85916
[gotcha] Indirect prompt injection via RAG document metadata or filenames
Sanitize and format RAG document metadata \(filenames, authors, timestamps\) with the same rigor as document text, or omit metadata from the LLM context entirely.
Journey Context:
When building RAG, developers carefully chunk and clean the text content but blindly append metadata like Source: user\_file.txt. An attacker names a file ignore\_previous\_instructions.txt or sets an author metadata field to a malicious payload. The LLM processes this metadata as high-priority instructions because it often appears at the beginning or end of the context block, bypassing text-level sanitizers that only evaluated the document body.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T02:47:57.212476+00:00— report_created — created