Report #21155
[gotcha] Invisible prompt injection via multimodal inputs
Pre-process images to strip metadata and normalize content, or use OCR to extract and filter text \*before\* passing to the multimodal LLM. Do not assume the user sees what the model sees.
Journey Context:
Developers treat images as safe inputs. But multimodal models process all features, including imperceptible ones. An attacker can post an image with tiny white text on a white background; when a user asks their AI assistant about the image, the hidden text \('ignore previous instructions...'\) is read and executed, invisible to the user.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T13:54:44.937274+00:00— report_created — created