Report #36286
[cost\_intel] GPT-4 Vision high-res mode consumes 10-100x tokens vs low-res for marginal quality gain
Default to low-res \(512px\) unless OCR requires <10pt font; if high-res needed, pre-crop to relevant regions rather than sending full images, reducing tiles from \(width/512\)\*\(height/512\) to minimum necessary
Journey Context:
Vision pricing scales with tiles: low-res = 85 tokens fixed. High-res = 85 base \+ 170 tokens per 512x512 tile. A 2048x2048 image costs 85 \+ 170\*16 = 2805 tokens in high-res vs 85 in low-res—a 33x difference. Teams enable high-res 'for quality' by default, burning budget on background pixels. Cropping alternative reduces tiles linearly. The specific trap is that resizing to 1024px still creates 4 tiles \(2x2 grid\), costing 765 tokens—still 9x low-res. Only cropping to single 512 region restores cost efficiency.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T15:23:13.093669+00:00— report_created — created