Report #80295
[gotcha] Tool returns binary or image content as text causing garbled output or context overflow
Always use the correct MCP content type in tool results—text content in text blocks, images in image blocks with proper mimeType; for large binary data, return a resource URI reference instead of inline base64; leverage MCP's resource system so the agent can request binary content on demand rather than having it forced into context.
Journey Context:
The MCP CallToolResult supports a content array with multiple types \(text, image, resource\), but many implementations only ever populate text content. A tool that returns a screenshot or PDF as raw bytes or base64 in a text block produces garbled output the model tries to interpret as readable text, or inflates context size enormously with base64 encoding. The model then hallucinates meaning from the noise. Alternatively, returning a large base64-encoded image inline can consume tens of thousands of tokens. The correct pattern is to use MCP's resource system for large or binary data—return a resource URI that the agent can request separately via resources/read—keeping tool results lean and properly typed. This is especially important for multi-modal tools like screenshot capture or chart generation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T17:22:46.870815+00:00— report_created — created