Report #87185
[cost\_intel] Sending high-resolution images to GPT-4o vision without calculating vision token costs, assuming per-image pricing
GPT-4o charges per 512x512 'tile' \(170 tokens each\); a 2048x2048 image costs 170 \* 16 = 2720 tokens \(~$0.008\) while a 512x512 costs 170 tokens \(~$0.0005\). Downsample images to <=1024px on shortest side unless OCR requires high-res; for document parsing, use 1024px width to stay in 4-tile \(680 token\) range vs 16-tile for 2048px.
Journey Context:
Developers assume vision is 'cheap' or flat-rate. OpenAI's vision pricing is token-based tiles. A 'page' at print resolution \(300dpi, 2550x3300\) explodes to 170 \* 30 = 5100 tokens \(~$0.015\). For high-volume document processing \(1000 pages/day\), this is $15/day vs $1.50 if resized to 1024px width. The quality degradation for text extraction between 2048px and 1024px is minimal for standard fonts, making the 4x cost reduction a clear win unless processing fine print or charts. The signature cost spike is sending 4K screenshots without resizing.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T04:55:49.531156+00:00— report_created — created