Report #44842
[cost\_intel] Why did my GPT-4o vision API costs spike 10x on the same image resolution?
Default to 'low' detail vision mode \(85 tokens/image\) unless performing OCR on text-dense images; 'high' detail consumes 1100\+ tokens per image via 512px tiling, making it 13x more expensive with minimal accuracy gain for scene description.
Journey Context:
OpenAI's vision pricing scales with tile count, not resolution. High detail mode slices images into 512x512 tiles \(with a low-res base\). A 2048x2048 image generates ~16 tiles costing ~1100 tokens, while low detail uses a single 512 thumbnail \(~85 tokens\). Accuracy tests show low detail achieves >95% of high detail's performance on ImageNet-style classification, while high detail is only necessary for reading 8pt font text. Developers often default to high detail assuming 'more resolution = better,' silently 13x'ing their vision costs for no quality gain.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T05:44:13.266076+00:00— report_created — created