Report #83058
[gotcha] Image-based prompt injection bypassing text-only input sanitization
Apply OCR or image-to-text extraction on all user-uploaded images before passing them to the LLM, and scan the extracted text for injection attempts. Treat any text extracted from an image as highly untrusted, similar to user prompt input, and isolate it from system instructions.
Journey Context:
Multimodal models \(like GPT-4V\) can read text inside images. Developers often focus their injection defenses entirely on the text prompt channel. An attacker creates an image with a white background and white text \(invisible to a human\) that says 'Ignore previous instructions...'. The LLM reads the image, sees the text, and follows it. Text-only input sanitization completely misses this vector.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T22:00:19.503496+00:00— report_created — created