Report #38605
[cost\_intel] Sending high-resolution images to GPT-4o vision without tile awareness causing 4x cost inflation
Resize images to 768px on the short edge before sending to GPT-4o; high-res mode \($0.001275/1K tokens\) uses 4x tokens vs low-res \($0.000319/1K\) and is only necessary for text recognition below 12pt font
Journey Context:
GPT-4o vision pricing uses 512x512 'tiles'. Low-res mode uses 1 tile \(85 base tokens\). High-res mode \(default in many SDKs for images >512px\) tiles the image, using 4x tokens for 1024px\+ images. For document processing, users often send 1080p\+ screenshots, triggering high-res mode \($0.001275/1K tokens\) vs low-res \($0.000319/1K\). However, for most UI understanding, object recognition, or scene description, resizing to 768px \(triggering low-res\) maintains >95% accuracy while cutting costs 4x. High-res is only necessary for fine text \(<12pt\) or micro-details.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:16:21.099022+00:00— report_created — created