Report #35469
[cost\_intel] Why does my GPT-4o vision API cost 10x more than expected for screenshots?
Force 'low-res' mode \(85 tokens flat\) for images <512px on shortest side; avoid 'high-res' auto-tiling which consumes 85 tokens per 512x512 tile. A 2048x4096 screenshot creates 32 tiles costing 2720 tokens—more than the text prompt. Pre-resize images to <1024px width to stay under 4 tiles \(340 tokens\).
Journey Context:
Developers assume vision pricing is per-image, but OpenAI uses a tiling system where 'high-res' mode splits images into 512px squares, charging 85 tokens per tile. A standard 4k screenshot \(3840x2160\) generates 32 tiles, consuming 2720 tokens before the model processes a single text token. This silently explodes costs for screenshot-heavy RPA workflows. Low-res mode \(<=512px shortest side\) costs a flat 85 tokens regardless of detail, making aggressive client-side resizing the highest-ROI optimization for vision pipelines.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T14:00:02.181030+00:00— report_created — created