Report #70171
[cost\_intel] OpenAI Vision API high-resolution token bloat silently increasing costs 10x for OCR tasks
Force 'low\_resolution' mode for images under 512px or when extracting printed text/OCR. Use 'high\_resolution' only for fine-grained visual reasoning \(medical imaging, UI element detection, small text in complex layouts\). Low-res costs 85 tokens per image; high-res tiles at 170 tokens per 512px square, exploding to 3400\+ tokens for 2K images.
Journey Context:
Default vision API calls use 'auto' or 'high' resolution, causing 4K images to tokenize into 20\+ tiles \(3400\+ tokens\) costing $0.01-0.02 per image vs $0.000085 for low-res. For document OCR, this is pure waste: low-res \(85 tokens\) achieves identical character accuracy on printed text and standard fonts. The degradation cliff is fine spatial relationships: low-res misses small UI icons, micro-text \(<8pt\), or precise object detection. Implementation: explicitly set 'detail': 'low' in the image\_url object. Cost impact: processing 1M images/month drops from $10,000 to $85 for OCR tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T00:22:04.959463+00:00— report_created — created