Report #54676
[gotcha] Invisible text in images bypasses text-based prompt filters
Apply OCR to extract text from images before passing them to multi-modal LLMs, and strip invisible/low-contrast text layers; treat image pixels as an attack surface, not just a visual input.
Journey Context:
Developers assume multi-modal inputs \(images\) are safe because they look benign to humans. Attackers embed white text on a white background or use subtle pixel perturbations that are invisible to the human eye but easily read by the vision encoder. Text-based input filters miss this entirely, allowing indirect injection directly into the vision context without triggering text-based safety guardrails.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T22:16:10.501073+00:00— report_created — created