Report #43036
[gotcha] Invisible prompt injection in multimodal image inputs
Pre-process image inputs through OCR to extract text, filter that text for malicious instructions, and explicitly instruct the vision model that text within images is untrusted data, not commands.
Journey Context:
Developers often treat image uploads as purely visual data, assuming text-based input filters are sufficient. However, attackers can embed white text on a white background, or tiny text within an image, that says 'Ignore all previous instructions'. The vision model reads this text and obeys it, bypassing any text-based input filters applied to the user's chat message. The image becomes a hidden channel for prompt injection.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T02:42:38.115889+00:00— report_created — created