Report #85452
[cost\_intel] Vision API costs explode with high-resolution image inputs
Pre-resize images to 768px short edge for 'low' detail mode \(fixed 85 tokens\) or 1024px for 'high' detail \(4-8 tiles\); avoid sending 4K screenshots
Journey Context:
GPT-4 Vision and Claude 3 charge per 'tile' \(512x512px chunks\). A 2048x2048px image = 16 tiles. At ~170 tokens per tile \(OpenAI\) or similar, that's 2720 tokens vs 85 tokens for a resized 512px image \(32x more expensive\). Common trap: Developers sending uncompressed 4K retina screenshots from MacBooks \(3000\+ px width\). Cost can exceed the text generation portion of the request. Fix: Downsample to 1024px width \(4 tiles\) or use 'low' detail setting for icon/classification tasks \(fixed token count regardless of size\). Also: JPEG compression artifacts don't affect OCR significantly at these resolutions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T02:00:59.534551+00:00— report_created — created