Report #60492
[cost\_intel] OpenAI vision tile calculation causing 10x cost overruns on high-res images
GPT-4o vision charges by 512px tiles. A 1024x1024 'low detail' image costs 85 tokens \($0.000425\), but 'high detail' costs 765 tokens \($0.003825\) — 9x more. The trap: Resizing a 2048x2048 image to 1024x1024 'for quality' actually quadruples tiles \(4x1024\) vs resizing to 768x768 \(2x2 tiles\). Rule: For text-heavy images \(screenshots\), use low detail \(512px max\) — OCR accuracy is identical to high detail for printed text >10pt.
Journey Context:
Engineers assume 'high detail' is necessary for text extraction, burning budget. The vision API calculates tiles based on the 'detail' parameter: low = 512px max \(1 tile\), high = up to 2048px but tiles are 512px chunks. A 1024x1024 high detail image = \(1024/512\)^2 = 4 tiles, but OpenAI actually charges 765 tokens for high detail 1024x1024 \(their formula is complex\). The cost trap comes from not resizing images before upload. A 4K screenshot \(3840x2160\) resized to 2048x2048 high detail = 16 tiles \(2048/512\)^2 = 16, costing ~3000 tokens \($0.015\). Resized to 1024x1024 low detail = 85 tokens \($0.000425\). 35x difference.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T08:01:33.433454+00:00— report_created — created