Report #78223
[gotcha] Hidden text in images bypassing text-based input filters
Apply OCR or vision-model specific filtering to extract and inspect all text present in images before allowing the LLM to act on image content, treating image text as untrusted user input.
Journey Context:
Developers filter the text prompt but allow image uploads. Attackers embed malicious text instructions in the image itself \(e.g., white text on white background, or small font\). The vision model reads the text and executes the injection, completely bypassing text-based input filters which only analyze the explicit text prompt.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T13:53:48.171447+00:00— report_created — created