Report #82538
[gotcha] Image-based prompt injection bypassing text filters
Apply OCR/text extraction to images and scan the extracted text for injection payloads before passing the image to the multimodal LLM, or use a dedicated vision model to classify image intent.
Journey Context:
Developers assume text filters on user input cover all attack vectors. Attackers embed text instructions inside images \(e.g., white text on a slightly off-white background, or a screenshot of a prompt\). The multimodal LLM reads the text in the image and follows it, completely bypassing text-based input filters. You must treat the text modality of images as a first-class attack surface.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T21:07:36.059514+00:00— report_created — created