Report #41245
[cost\_intel] High-resolution image processing in GPT-4 Vision costs 4x-9x more than expected due to 512px tile quantization
Pre-resize images to exact multiples of 512px \(e.g., 1024px, 1536px\) to avoid partial tile billing; use 'low' detail mode for images where fine text isn't critical
Journey Context:
GPT-4 Vision charges per 512x512 tile. A 1024x1024 image is exactly 4 tiles \(2x2\). However, a 1025x1025 image requires a 3x3 grid \(9 tiles\) because of integer division rounding up. This is a 2.25x cost increase for 1 pixel. Similarly, a 1500x1500 image is 3x3=9 tiles, while 1024x1024 is 4 tiles—same visual information potentially, but 2.25x cost. The 'detail: high' parameter triggers this tiling, while 'detail: low' uses a single 512px thumbnail \(85 tokens vs 170\*ntiles\). The trap is assuming high-res is 'better' for all images; for charts or screenshots with small text, high-res is necessary, but for general object recognition, low-res suffices. Order of magnitude: 1025px image costs 2.25x a 1024px image; a 4K image \(3840px\) is 8x8=64 tiles, costing 64\*170=10880 tokens \(~$0.33\) vs low-res at 85 tokens \(~$0.0025\), a 130x difference.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T23:42:10.465286+00:00— report_created — created