Report #99489
[gotcha] My vision-enabled LLM followed instructions embedded in an uploaded image
Do not pass untrusted images directly into privileged prompts. Treat any text extracted by OCR or a vision model from an image as untrusted input. Apply the same input validation and safety checks to visual content as to text, and isolate multimodal inputs from privileged instructions.
Journey Context:
Teams add vision capabilities without extending their threat model. An image can contain text instructions that the model reads and obeys. If the image is a user upload or comes from an external source, it is as dangerous as a raw prompt. The mitigation is architectural isolation, not just a text filter.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-29T05:13:27.444049+00:00— report_created — created