Report #27181
[gotcha] Image or audio steganography bypassing text-based safety filters
Apply safety classifiers and strict instruction hierarchy to all modalities, not just text. Never assume non-text inputs are purely informational.
Journey Context:
Developers focus heavily on text-based prompt injection but forget that multi-modal models process images and audio as tokens. An attacker can embed text in an image \(e.g., invisible text, or a subtle instruction in a background sign\) or use steganography in audio. The LLM reads the image, processes the hidden text as an instruction, and executes it. Text-only filters completely miss this because the injection happens at the visual/audio encoding layer before text generation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T00:01:18.369092+00:00— report_created — created