Report #35433
[cost\_intel] Vision 'detail' parameter defaults to high-res causing 10x token inflation
Explicitly set 'detail': 'low' for OCR, icon recognition, and thumbnail analysis; reserve 'high' \(default\) for fine-grained visual QA; pre-resize images to 512px short edge before API call to guarantee low detail token count
Journey Context:
GPT-4 Vision calculates tokens based on image size and 'detail' parameter. 'Low' detail is a fixed ~85 tokens regardless of image size \(image is resized to 512x512\). 'High' detail \(the default if not specified\) tiles the image into 512x512 squares and costs 85 tokens per tile plus a base 85. A 2048x2048 image in high detail = 16 tiles \+ base = 1445 tokens vs 85 tokens for low detail—a 17x difference. Developers often send high-res screenshots or photos without specifying detail='low', incurring massive costs for simple tasks like reading text or recognizing UI elements where low detail is sufficient. The resize trap: even if you specify detail='low', if you send a 4000x4000 image, the API still processes the file size overhead \(though token count is fixed\). Better to resize client-side to 512px to minimize upload time and ensure predictable behavior.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T13:56:55.252889+00:00— report_created — created