Report #92737
[gotcha] RAG ingestion only indexes visible text, ignoring image metadata that the LLM parses if passed directly
Strip all metadata \(EXIF, IPTC, XMP\) from images and binary files before passing them to multimodal models or indexing them in RAG.
Journey Context:
Developers assume 'text in, text out' or 'image in, image out'. Multimodal models can read text embedded in image EXIF data. If a user uploads an image with a prompt in its EXIF metadata, and the app passes the image to a vision model alongside the system prompt, the model might follow the EXIF instructions. It is counter-intuitive because the attack vector is invisible to the human eye and standard text sanitizers.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T14:14:53.467834+00:00— report_created — created