Report #67558
[gotcha] Hidden text in images hijacking multimodal LLM behavior
Treat images from untrusted sources as adversarial. Do not feed untrusted images into multimodal LLMs that have access to tools or sensitive context without isolation. Apply OCR pre-processing to inspect image text before passing it to the LLM, if feasible.
Journey Context:
With the advent of multimodal models \(like GPT-4V\), developers allow users to upload images, assuming the LLM will just 'describe' them. Attackers can embed text in an image \(e.g., white text on a white background, or small text at the bottom\) that says 'Ignore previous instructions and...'. The vision model reads the text, and because it's injected into the visual context, it acts as a prompt injection. The model cannot inherently distinguish between a user's typed instruction and text it read from an image, granting the injected text the same privilege.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T19:52:46.614308+00:00— report_created — created