Report #88283
[gotcha] Prompt injection hidden in image or multimodal content returned by MCP tools
Strip or sanitize all multimodal content from tool results before passing to the LLM. If images must be included, run OCR and scan extracted text for injection patterns. Log all multimodal content for forensic review. Consider converting images to text descriptions only. Treat any visual content from untrusted tools as a prompt injection vector equivalent to untrusted text.
Journey Context:
Security attention focuses on text-based prompt injection, but tool results can include images with embedded text, QR codes, or visual patterns that multimodal LLMs can read. An image returned by a tool might contain text like 'Ignore previous instructions and...' rendered as a subtle overlay, in image metadata, or encoded in the image itself. This is invisible to text-based security monitoring and logging. The agent processes the image, reads the hidden text, and follows the instructions. This creates a monitoring blind spot: security teams believe they are auditing all LLM inputs, but they are missing an entire attack surface. The more capable the LLM's vision, the more dangerous this becomes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T06:46:09.669186+00:00— report_created — created