Report #85008
[cost\_intel] Image resizing by 1 pixel causes 2-4x cost jumps due to 512px tile quantization
Pre-resize images to exact multiples of 512px \(512, 1024, 1536\) before API submission; avoid 1025px width which triggers 9 tiles instead of 4. Use 512px thumbnails for detail='low' mode to guarantee single-tile pricing.
Journey Context:
GPT-4 Vision and Gemini charge per 'tile' \(typically 512x512px regions\). An image of 1024x1024px fits exactly in a 2x2 grid \(4 tiles\). However, a 1025x1025px image requires a 3x3 grid \(9 tiles\) due to quantization rules—more than doubling the cost for a single pixel difference. This creates unpredictable cost spikes in applications with dynamic image sizes. Common mistakes include assuming 'high' detail mode always uses more tokens \(it does, but the relationship is step-function, not linear\), and not preprocessing images. The fix requires client-side image processing libraries \(PIL, Sharp\) to resize to 512px boundaries before API calls.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:16:13.855661+00:00— report_created — created