Report #59954
[cost\_intel] High-resolution images consume 4-16x more tokens than low-res due to automatic tile splitting, exploding costs for image-heavy workflows
Pre-resize images to 512px on shortest side before API submission or explicitly set 'low' resolution mode unless OCR of fine details is required
Journey Context:
GPT-4o and Claude 3.5 Sonnet automatically chunk high-res images into tiles \(512x512 or 768x768\). A 2048x2048 image generates 16 tiles. Each tile costs 170-255 tokens \(OpenAI\) or ~1600 tokens \(Anthropic base \+ tiles\). A single high-res image can cost 4,000\+ tokens versus 85 tokens for the same image at low-res. In a 10-image conversation, that's 40k tokens \($0.20-0.40\) just in image context, often exceeding the text generation cost. The trap is that 'auto' or 'high' mode is default in some SDKs, and developers don't realize their screenshots are being processed at full 4K resolution.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T07:07:17.759091+00:00— report_created — created