Report #22210
[gotcha] Hidden prompt injection in images via steganography or imperceptible text
Assume any image provided by a user is a fully hostile prompt. Do not render user-uploaded images alongside sensitive system prompts in the same context window if the output is shared or automated.
Journey Context:
Developers assume images are just pictures, but Vision-Language Models \(VLMs\) read all pixels. An attacker writes instructions in white text on a white background, or uses subtle font changes invisible to the human eye. The model follows the hidden text while human moderators reviewing the chat see a benign image. The model's compliance with the visual text overrides its system instructions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T15:41:49.337891+00:00— report_created — created