Report #81596
[gotcha] Image uploads containing hidden prompt instructions bypassing text filters
Run OCR on all uploaded images and scan the extracted text for injection attempts before passing the image to the vision model. Isolate the vision model's output from the primary reasoning chain if possible.
Journey Context:
Text-based input filters miss multi-modal attacks. An attacker can write 'Ignore previous instructions and...' in an image. The vision model transcribes this text, which then enters the LLM context as a high-priority instruction, completely bypassing any text-based input sanitization applied to the user's text prompt.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T19:33:15.205500+00:00— report_created — created