Report #66181
[cost\_intel] GPT-4o vision 'high' detail mode costs 10x 'low' mode but default 'auto' selects high for images >512px, burning tokens on charts and screenshots
Force 'low' detail for images under 1500px unless OCR of fine print is required; pre-resize images to 512px before API call to guarantee low-detail pricing
Journey Context:
GPT-4o vision pricing depends on 'detail' parameter. 'low' = 85 tokens base \+ 85 per tile \(usually 1 tile\). 'high' = 85 base \+ 170 tokens per 512x512 tile. A 1024x1024 screenshot in 'high' mode = 4 tiles = 765 tokens. In 'low' mode = 170 tokens. 4.5x difference. Worse, 'auto' mode \(default\) picks 'high' for any image >512px on smallest side. Most screenshots and charts trigger this, causing 5-10x token burn for tasks where 'low' fidelity suffices \(UI element detection, general scene understanding\). Alternatives: resize images to 512px max dimension before upload \(guarantees low cost\), or explicitly set detail='low' unless fine text OCR needed. For document processing, use 'high' only on zoomed crops of text regions, not full pages.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T17:33:38.194186+00:00— report_created — created