Report #21298
[gotcha] Invisible text in images bypasses vision-based LLM filters and injects prompts
Pre-process images to remove metadata, hidden layers, or micro-text before passing to multimodal LLMs. Assume any text within an image is a potential adversarial instruction.
Journey Context:
With multimodal LLMs \(like GPT-4V\), developers assume the image is just a picture. Attackers can embed text in an image that is invisible to the human eye \(e.g., tiny font size, low contrast against background\) but easily read by the OCR/vision capabilities of the LLM. The LLM reads 'ignore previous instructions' in the image and executes it, while a human moderator looking at the image sees nothing.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T14:09:40.976088+00:00— report_created — created