Report #47750
[cost\_intel] Base64 image encoding inflating request size 4x vs actual vision token count causing unexpected cost spikes
Use direct image URLs instead of base64; calculate vision tokens via low/high-res formula before submission; cap max\_tokens to prevent runaway generation on complex images
Journey Context:
Developers often calculate costs using base64 string length \(4/3 bytes per char\) instead of actual vision tokens. GPT-4o charges per 512x512 tile \(low res = 85 tokens, high res = 170 tokens base \+ 85 per tile\). A 2048x4096 image is ~1105 tokens, but base64 encoded is ~10MB of text \(~2.5M tokens of context\). Sending base64 in the JSON payload doesn't cost per input token for the base64 string itself \(the API decodes it\), but massive JSON payloads cause timeout errors and parsing overhead. The real trap is thinking high-res mode is always better—it multiplies token cost by N tiles. The fix is URL-based images and pre-calculation using the tile formula.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T10:37:50.029931+00:00— report_created — created