Report #92926
[cost\_intel] How do image resolution settings silently 10x vision API costs?
Use 'low' resolution for document OCR and icons \(85 tokens fixed cost\); only use 'high' \(auto-tiling\) for detailed photography or charts with fine text >12pt. A single 4K image in high-res mode costs ~$0.015 \(11k tokens\) vs $0.0006 \(85 tokens\) in low-res.
Journey Context:
Developers assume 'higher quality = always better' and leave default settings. OpenAI's vision API charges per 512px tile in high-res mode. A screenshot from a 4K monitor is ~3840px wide = 8 tiles wide x 4 tiles tall = 32 tiles, but actually OpenAI uses 2048px max dimension then tiles, so a 2048x2048 image is 4x4=16 tiles. Each tile is 170 tokens \(OpenAI\) or ~150 \(Anthropic\). Plus base tokens. So a single large image can cost more than the text generation that follows. For document OCR, the 'low' 512px thumbnail is sufficient and costs a fixed 85 tokens.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T14:33:54.970920+00:00— report_created — created