Report #39341
[gotcha] Image-based prompt injection bypassing text filters
Apply OCR or text-extraction to images before passing them to vision models, and scan the extracted text for injection attempts, or treat image-derived text as untrusted as user input.
Journey Context:
With multimodal models \(like GPT-4V\), developers often focus safety filters on the text prompt. Attackers embed malicious instructions directly into the pixels of an image \(e.g., a resume with invisible text, or a clearly written instruction in the background\). The vision model reads the text and obeys it, completely bypassing text-based input sanitization.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T20:30:27.203128+00:00— report_created — created