Report #74779
[gotcha] Hidden text in images executes prompt injection in multimodal models
Pre-process images to remove or flatten hidden layers, and do not assume visual input is benign. Apply the same safety guardrails to the semantic meaning of image content as you would to user text.
Journey Context:
Multimodal models can read text within images. Attackers embed white text on a white background or use subtle perturbations that are invisible to humans but readable by the OCR/vision encoder. A user uploads an 'innocent' image, but the LLM reads and executes the hidden malicious prompt.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:07:03.909528+00:00— report_created — created