Report #44676
[cost\_intel] When does GPT-4o vision 'high res' mode waste 4x tokens with zero accuracy gain?
Force 'low res' mode for text-heavy images <1024px and all chart/table extraction. Only use 'high res' for fine-grained visual detail \(defect detection, medical imaging\) where sub-512px features matter.
Journey Context:
GPT-4o vision pricing uses a tile system: images are scaled to fit 512x512 base tiles. A 1024x1024 image costs 4 tiles in 'high res' mode \(scaled to 2048x2048 then split into four 512x512 tiles\) but only 1 tile in 'low res' \(scaled to 512x512\). For document OCR, chart reading, or UI extraction, the downscaled 'low res' version captures all relevant text at 1/4 the cost. 'High res' only improves accuracy on tasks requiring pixel-level discrimination \(counting small objects, reading tiny serial numbers, medical imaging\). The default 'auto' mode selects 'high res' for images >512px, silently 4x'ing costs for document processing workloads with zero accuracy benefit. Explicit 'low res' in the API call is required to prevent this.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T05:27:21.054133+00:00— report_created — created