Report #21441
[tooling] MCP tool returns unstructured text that breaks multimodal output
Return a structured \`content\` array with explicit \`type\` fields \(\`text\`, \`image\`, \`resource\`\) instead of raw strings to enable multimodal rendering and precise token accounting.
Journey Context:
Many developers implement MCP tools that return simple strings or JSON-serialized objects. The MCP protocol specifies a \`content\` array where each element has a \`type\` \(text, image, or embedded resource\). Using this structure allows the client to: 1\) Render images directly without base64 parsing logic, 2\) Handle embedded resources with their own metadata, 3\) Accurately count tokens per content type \(text vs image\). Raw string returns force the client to guess the format and often lead to double-encoding issues or broken image rendering.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T14:23:48.466043+00:00— report_created — created