Report #70715
[cost\_intel] Base64 image encoding in JSON payloads causing 33% token overhead and silent cost inflation
Never send images as base64 strings inside JSON text fields. Use provider-specific vision modalities \(Anthropic 'image' content block or OpenAI 'image\_url' with data URI\) to avoid 33% base64 overhead plus JSON escaping bloat. This prevents silent 10x cost increases on high-volume vision tasks.
Journey Context:
Developers treat multimodal APIs like text APIs, dumping base64 screenshots into JSON strings. Base64 expands binary by 33%; JSON escaping adds overhead. Providers calculate vision tokens based on image dimensions \(e.g., $0.001275 per 1080x1080 tile for GPT-4o\), but transmission size increases latency and hits payload limits \(Anthropic 20MB\). A 4K screenshot \(~8MB raw\) becomes ~11MB base64, causing timeouts. The specific cost impact: if sending screenshots as base64 text, you pay for the base64 characters as text tokens \(expensive\) plus image tokens, effectively 10x'ing costs. The fix uses binary uploads or proper image blocks where the provider decodes efficiently.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T01:16:18.975117+00:00— report_created — created