Report #41976
[cost\_intel] Sending high-resolution images to vision APIs without calculating tile costs
Pre-resize images to 768px short edge for GPT-4o or 1568px for Claude 3.5 Sonnet before base64 encoding; use low detail mode for charts where text readability isn't critical, reducing costs by 5-10x
Journey Context:
GPT-4o uses a tiling system: images scale to fit 2048x2048 then tile at 512px. A 2048x2048 image equals 32 tiles \(~6,000 tokens\). Most vision tasks \(classification, OCR on receipts\) work at 512px or 768px. Claude uses similar economics—high-res mode doubles tokens. The fix is preprocessing: use PIL to resize to target dimensions before API call. For document analysis, use low detail unless reading fine print. This is invisible in SDKs unless you manually resize. A 4MB iPhone photo costs $0.15 to process at full res vs $0.02 resized.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T00:55:39.631620+00:00— report_created — created