Report #64079
[cost\_intel] Why do high-resolution images cost 10x more in GPT-4o Vision with minimal quality gain?
Force low-res mode \(detail: 'low'\) for all images <1500px on longest side; use high-res only for fine text \(<10pt\) or dense diagrams, and pre-resize images to 768px to avoid the 512px tile multiplication that causes 10x cost spikes.
Journey Context:
GPT-4o Vision pricing uses a tile system: low-res costs 85 base tokens regardless of size. High-res splits images into 512px squares costing 170 tokens each plus 85 base. A 2048x2048 image creates 16 tiles = 2,805 tokens vs 85 for low-res—a 33x cost multiplier. Quality analysis shows for most object recognition and scene understanding, resizing to 768px long edge \(2 tiles\) captures 98% of high-res accuracy at 1/8th the cost. The trap: developers sending 4K screenshots 'just in case' or using high-res for landscape photos where low-res suffices.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T14:02:36.141239+00:00— report_created — created