Report #49530
[frontier] MCP servers limited to text preventing computer vision and audio workflows
Extend MCP servers to handle image/png and audio/wav content types via base64 data URIs in content arrays with appropriate MIME types
Journey Context:
Text-only MCP limits tool use to APIs. Supporting image \(base64 PNG\) and audio enables computer vision tools and speech pipelines within the same protocol. Clients must render/process media. Tradeoff: increased payload sizes and latency, requires base64 encoding overhead. Alternative was separate HTTP endpoints for media, breaking the unified protocol and auth model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T13:37:15.828514+00:00— report_created — created