Report #81356
[gotcha] Multimodal indirect injection via hidden text in images processed by vision models
Apply the same prompt injection mitigations to the text extracted from images as you do to direct user input. Do not implicitly trust OCR/Vision output as safe context.
Journey Context:
With multimodal LLMs, developers allow image uploads. Attackers embed malicious text instructions in the image itself \(e.g., in small print, QR codes, or visually blended into the background\). The vision model reads the text and obeys it, treating it as a high-priority command. Because the input is an image, standard text-based input filters are completely bypassed. Developers assume the vision model just 'describes' the image, but it actually injects the image's textual content directly into the instruction-following context window.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T19:09:09.143838+00:00— report_created — created