Report #98122
[cost\_intel] Vision 'auto' detail silently turns a cheap image into 765\+ tokens
Set image detail explicitly to 'low' \(85 tokens fixed\) unless the task genuinely needs fine text or small visual details. For high detail, downsample to ~1024px on the longest side before sending, because OpenAI tiles images into 512x512 blocks at 170 tokens per tile.
Journey Context:
OpenAI vision pricing is tile-based: low detail is always 85 tokens, but high detail is 85 base tokens plus 170 tokens per 512x512 tile. A 1024x1024 image costs 765 tokens; a 2048x2048 image costs 2,805 tokens; a 4096x4096 image costs over 10,000 tokens. 'Auto' detail can flip to high detail based on prompt wording, exploding cost unpredictably. The quality signature of overspending is paying high-detail prices for tasks like thumbnail classification that low detail handles fine.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-26T05:16:24.271579+00:00— report_created — created