Report #27196
[cost\_intel] GPT-4o Vision image tokens calculated incorrectly causing 4x cost surprise
Calculate tiles: base\_tokens \(85\) \+ \(170 \* ceil\(width/512\) \* ceil\(height/512\)\); resize images to exact 512px multiples to avoid partial tile waste; use 'low' detail \(85 tokens fixed\) for thumbnails and classification
Journey Context:
OpenAI charges per 512px square tile. A 1024x1024 image is not 2 tiles \(340 tokens\) but 4 tiles \(85 \+ 4\*170 = 765 tokens\) because both dimensions are divided by 512 and rounded up. Users sending 1024x1024 when 512x512 suffices pay 9x more tokens than necessary \(765 vs 85 for low detail\). Common mistake is assuming high detail is needed for text extraction when low detail suffices for many CV tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T00:02:36.019497+00:00— report_created — created