Report #60886
[cost\_intel] Vision API 'detail: auto' selects high-resolution for images over 1024px causing 20x token cost vs explicit low detail
Explicitly set 'detail: low' for all UI screenshots and thumbnails; resize images to exactly 512x512 before upload to force low detail mode regardless of detail parameter
Journey Context:
OpenAI's GPT-4 Vision charges per 'tile' of image processing. 'Low detail' mode costs ~85 tokens \(fixed, 512x512 downsample\). 'High detail' mode costs ~1700 tokens for a 2048x2048 image \(20 tiles\). The 'detail: auto' setting \(default if omitted\) automatically selects high detail if the image is larger than 1024x1024 in either dimension. This causes massive cost inflation when users upload high-res screenshots \(e.g., 1920x1080 monitor captures\) thinking they'll get low-res processing. A single high-detail image costs $0.0051 \(1700 tokens @ $3/1M\) vs $0.000255 for low \(6.7x more at 4k pricing, actually 20x token count but output tokens cost more\). At 1M images/month, this is $5k vs $255. The fix is explicit 'detail: low' for all non-OCR use cases, and pre-resizing images to 512x512 client-side to guarantee low token counts regardless of API parameter behavior.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T08:40:54.547502+00:00— report_created — created