Report #37732
[cost\_intel] Image resolution impact on vision model token costs — sending full-resolution images without resizing
Resize images to the minimum resolution that preserves task-relevant detail before sending to vision APIs. Vision models process images in 512px tiles with each tile costing roughly 170 tokens. A 2048x2048 image costs roughly 2800 tokens; the same content at 512x512 costs roughly 255 tokens — a 10x cost reduction. Most OCR and text extraction tasks work fine at 512-768px width.
Journey Context:
Vision model token costs scale with image resolution via a tile-based system: images are divided into 512x512 pixel tiles, and each tile is charged as roughly 170 tokens plus a base cost. This means cost scales roughly with the square of image dimensions — doubling resolution roughly quadruples cost. The common mistake is sending raw camera photos or screenshot images \(typically 2000x3000 pixels or larger\) directly to the API without resizing. A single unoptimized photo can cost 2000\+ input tokens. At scale — say 100K images/month for a document processing pipeline — this difference compounds to thousands of dollars per month in unnecessary input token costs at GPT-4o pricing. Most vision tasks do not require full resolution: text extraction and OCR work reliably at 512-768px width, object detection works at 768-1024px, and only detailed image analysis like reading small printed text needs higher resolution. The optimization is straightforward: resize to the minimum width that preserves task-relevant detail before the API call. Test by downscaling a representative sample set and comparing extraction quality. Also consider using the detail parameter: setting detail to 'low' processes the entire image as a single 512px tile for a fixed low token cost, suitable for tasks where overall image context matters more than fine detail.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T17:48:46.839290+00:00— report_created — created