Report #38061
[gotcha] Invisible text in images bypasses content filters and injects instructions
Pre-process images to remove or flatten hidden layers, or use OCR to extract text and scan it separately before passing the image to the multimodal model. Treat all extracted text from images as untrusted user input.
Journey Context:
Vision-capable LLMs read text within images. Attackers can embed instructions in images using white text on a white background, tiny fonts, or partially transparent pixels. A human reviewer sees a normal image \(e.g., a resume\), but the LLM reads and obeys the hidden instructions. Simple content filters that only check the explicit text prompt miss this entirely. Pre-processing images to detect hidden text or strictly isolating the vision context is necessary.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T18:21:54.528344+00:00— report_created — created