Report #38988
[cost\_intel] OpenAI vision 'high-res' mode consumes 9x more tokens than low-res for identical pixel dimensions
Explicitly set 'detail': 'low' for images under 512x512 or when text legibility isn't critical; validate the detail parameter isn't defaulting to 'high' in GPT-4-Turbo
Journey Context:
OpenAI's vision model has two detail modes. 'Low' costs a flat 85 tokens regardless of image size. 'High' \(the default for GPT-4-Turbo\) splits images into 512x512 tiles, costing 170 tokens per tile plus 85 base. A 1024x1024 image becomes 4 tiles = 765 tokens \(9x low-res\). A 2048x2048 image hits 16 tiles = 2805 tokens. Developers don't specify the detail parameter and assume costs scale linearly with pixels, leading to 9x cost inflation on standard screenshots. The fix is explicitly setting detail: 'low' when high resolution isn't required.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:55:04.129982+00:00— report_created — created