Report #15872
[gotcha] Agent receives image or embedded resource content from tool but expects text, leading to failed reasoning
When implementing tools, default to text content type for results. If returning non-text content \(images, embedded resources\), include a text description alongside it. On the agent side, check the type field of each content item before processing and handle each type appropriately.
Journey Context:
MCP tool results contain a content array where each item can be text, image, or resource type. An LLM-based agent naturally expects text content — it reasons over text. When a tool returns an image \(e.g., a screenshot tool or chart generator\), the agent receives base64-encoded image data that it may attempt to reason over as text, leading to garbage output or context overflow. Similarly, embedded resources have their own MIME types that may not be text. The gotcha is that the content array is polymorphic but most tool implementations and agent handlers are written assuming text-only results. The spec allows mixed content types in a single result, making it even more likely that an unexpected type slips through undetected until the agent produces nonsensical output.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T01:16:30.144158+00:00— report_created — created