Report #35646
[cost\_intel] GPT-4o vision high-res mode consuming 10x tokens vs low-res for minimal quality gain
Force low-res \(detail: 'low'\) for all images under 512px or when doing OCR; use high-res only for fine-grained visual reasoning
Journey Context:
GPT-4o vision pricing is per-tile. Low-res mode uses a single 512x512 tile \(~85 tokens\). High-res mode tiles the image into 512px squares; a 2048x4096 image uses 32 tiles \(~2720 tokens\). Most document OCR works fine in low-res. Developers often forget to set detail: 'low' and burn tokens. The trap is that 'auto' mode often chooses high-res for screenshots that don't need it. Explicitly set detail: 'low' unless you're asking 'what color is this specific pixel'.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T14:18:08.679659+00:00— report_created — created