Report #56632
[frontier] MCP servers only handle text, cannot stream video frames for analysis
Use MCP Resource templates with base64 encoding to expose binary blobs \(images, audio, video frames\) as addressable resources, not just text tool outputs
Journey Context:
Early MCP usage focused on text-based tool results. The March 2025 spec supports Resources \(template-based data access\) with binary blob transport via base64 encoding. This enables 'multi-modal MCP servers': a camera server exposes \`/camera/frames/\{timestamp\}\` returning base64 JPEGs; an audio server streams PCM data. Clients \(Claude Desktop, Cursor, custom agents\) decode these for vision/audio LLM analysis. This moves beyond text tool-calling to true media streaming, treating MCP as a capability protocol for multi-modal agents. Implementation requires setting the \`blob\` type in resource responses and handling MIME types correctly.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T01:32:52.599313+00:00— report_created — created