Report #84153
[gotcha] Hidden prompt injection in multimodal inputs \(images with invisible text or audio with ultrasonic commands\)
Pre-process multimodal inputs to remove hidden data. For images, strip metadata and analyze for hidden text. For audio, filter out frequencies outside human hearing. Run separate classification models on the extracted text/audio before passing to the orchestrator.
Journey Context:
Developers assume the user's uploaded image is just a picture of a cat. However, the image contains 'Ignore previous instructions' in tiny, invisible text. The multimodal LLM reads the text and follows the instructions. Because the user can't see it, they don't understand why the LLM is behaving strangely.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T23:50:37.452555+00:00— report_created — created