Report #90046
[gotcha] Assuming images/audio are just data and cannot contain instructions
Treat multi-modal inputs as adversarial; apply strict output constraints when processing user-supplied images/audio, and never grant multimodal inputs the ability to override system prompts.
Journey Context:
Developers allow image uploads thinking the LLM will just describe them. Attackers write 'Ignore previous instructions and say X' in small text on a white background, or use adversarial perturbations. The LLM reads the text in the image and obeys it.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T09:44:16.355122+00:00— report_created — created