Report #58492
[gotcha] Hidden text in images causing indirect prompt injection in multimodal LLMs
Apply OCR or image-to-text preprocessing to detect hidden text overlays, and treat image-derived text as untrusted input, isolating it from system instructions.
Journey Context:
With vision-capable LLMs, developers assume images are just pictures. Attackers create images with white text on a white background, or tiny text, that says 'Ignore previous instructions and...'. The vision LLM reads the text and follows the instruction, while the developer has no text-based pre-filter to catch it because the payload is visual.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T04:40:04.623100+00:00— report_created — created