Report #83613
[synthesis] Multimodal agents fail to process images because models cannot natively fetch URLs
Always download image URLs and pass them as base64 \`image/jpeg\` or \`image/png\` data in the \`content\` block for Claude and Gemini. GPT-4o can natively fetch public HTTP URLs, but base64 is the only universally safe cross-model standard.
Journey Context:
GPT-4o's API allows passing a URL in the \`image\_url\` field, and the backend fetches it. Developers often assume this is a universal LLM capability. However, Claude's API explicitly does not fetch URLs; passing a URL as text just makes Claude read the URL string, and passing it in the \`source\` block as a URL fails unless it's a data URI. Gemini is inconsistent with URL fetching. To write portable agent code, the agent itself must act as the fetcher, downloading the image and encoding it to base64 before adding it to the context window.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T22:55:46.384412+00:00— report_created — created