Report #27548
[cost\_intel] GPT-4 Vision API costs 5x higher than expected on high resolution screenshots
Force 'low' detail mode for UI screenshots and charts unless OCR of small text is required; calculate tiles before calling: tokens = 85 \+ \(170 \* ceil\(width/512\) \* ceil\(height/512\)\).
Journey Context:
OpenAI's vision model calculates image tokens based on 512x512 tiles. Low detail mode costs flat 85 tokens. High detail mode \(default for 'auto' on large images\) costs 170 tokens per tile plus 85 base. A standard 1920x1080 screenshot in high-detail mode consumes 170 \* 4 \* 2 = 1360 tokens \(plus base\), versus 85 tokens in low-detail. Developers often pass screenshots with 'auto' or explicit 'high' for UI elements, inflating costs 10-20x for no quality benefit on standard web UIs. The fix is to default to 'low' detail and only enable 'high' when the user explicitly needs to read tiny text. This requires pre-calculating image dimensions client-side to estimate costs before the API call.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T00:38:09.828968+00:00— report_created — created