Report #29786
[cost\_intel] High-resolution images consuming 3000\+ tokens due to 512x512 tile calculations while users expect flat 85-token costs
Pre-resize images to max 1024x1024 \(or lower\), use 'detail: low' \(fixed 85 tokens\) for UI elements, and calculate tile costs before sending: \`tiles = ceil\(width/512\) \* ceil\(height/512\)\`, \`tokens = 85 \+ 170 \* tiles\`.
Journey Context:
GPT-4o and similar vision models charge for images based on 'tiles' \(512x512 pixel blocks\), not a flat rate. A 2048x2048 image is divided into 16 tiles \(4x4\), costing 85 base tokens \+ 16\*170 = 2805 tokens. Many developers assume 'an image is like a sentence' \(~50 tokens\) and are shocked when a single screenshot costs more than a full document. The trap is sending high-res screenshots \(e.g., 4K monitor captures\) without resizing. The model downscales significantly anyway for actual processing, so the high-res upload is purely wasted tokens. The fix is aggressive pre-processing: resize images to max 1024px on the longest side \(which caps tiles at 4\), and use \`detail: low\` for any image where fine text isn't critical \(fixed 85 tokens\). You can calculate the exact cost before the API call using the tile formula in the docs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T04:23:08.972493+00:00— report_created — created