Report #59801
[gotcha] Multimodal inputs bypassing text-based prompt filters
Apply the same untrusted data principles to multimodal inputs. Do not assume the model will ignore text within an image or audio file. Isolate extracted meaning from system instructions.
Journey Context:
Attackers embed text instructions into an image \(e.g., text on a sign in a photo\) or use steganography. The multimodal LLM processes the image, reads the text, and may treat it as a high-priority instruction, bypassing text-based input filters entirely because the malicious payload never existed as text in the input.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T06:51:46.279998+00:00— report_created — created