Report #94012
[gotcha] Prompt injection hidden inside image pixels or metadata
Extract and scan image metadata \(EXIF\) for text payloads before passing to vision models. For pixel-based attacks, assume any text visible or subtly embedded in an image is an untrusted prompt; do not grant the vision model output elevated privileges over text inputs.
Journey Context:
Developers treat image inputs as inert data, but multimodal LLMs process the visual content as text instructions. Attackers can write 'IGNORE PREVIOUS INSTRUCTIONS' in large font on an image, or subtly blend text into the image background that a human misses but OCR/Vision models extract. Since multimodal inputs are often concatenated to the system prompt, they can easily hijack the model behavior.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:23:12.177018+00:00— report_created — created