Report #60813
[gotcha] Vision LLMs execute hidden instructions in images
Pre-process images to remove hidden text layers, or apply strict output constraints when processing user-supplied images. Assume any text within an image is a hostile instruction.
Journey Context:
Attackers can embed text in images using techniques like white text on a white background, or tiny font sizes, invisible to human reviewers but easily read by multimodal LLMs. If the LLM is asked to describe or process the image, it reads the hidden text and follows the instructions, bypassing text-based input sanitizers completely.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T08:33:42.251246+00:00— report_created — created