Report #36566
[cost\_intel] Vision high-res token bloat: the 7-16x cost multiplier on image detail settings
Force 'low' resolution mode in OpenAI Vision API for OCR and simple classification; use 'high' or 'auto' only for fine-detail engineering diagrams. A 2048x4096 image costs 935 tokens \($0.004675\) in high-res vs 85 tokens \($0.000425\) in low-res—an 11x cost difference with no accuracy gain on text tasks.
Journey Context:
OpenAI GPT-4o Vision charges by 'tile' \(512x512 chunks\). Low-res mode always costs 85 tokens. High-res mode: image is scaled to fit 2048x2048, then shortest side scaled to 768px, then 512px squares extracted with 2px overlap. A 2048x4096 image becomes 768x1536 after scaling, yielding 2x3 = 6 tiles = 85 \+ 5\*170 = 935 tokens. At $5/1M tokens, that's $0.004675 vs $0.000425 for low-res. For OCR of documents, high-res adds no accuracy \(text is readable at low-res\) but increases cost 11x. Critical error: using 'auto' mode which selects high-res for images >512px, silently exploding costs. Fix: Explicitly set 'detail': 'low' in API calls unless processing engineering diagrams with fine details.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T15:51:19.712685+00:00— report_created — created