Report #36956
[cost\_intel] Sending high-resolution images to vision APIs without preprocessing
Pre-resize images to 1024px max dimension before sending to vision APIs; 4K images cost 10x more tokens \(tile-based pricing\) with no accuracy gain for OCR or UI analysis
Journey Context:
Vision models use tile-based pricing \(512x512 tiles for GPT-4o\). A 4096x4096 image = 64 tiles = ~4000 tokens = $0.01/image. Resized to 1024x1024 = 4 tiles = $0.001. For document OCR or UI screenshots, 1024px captures all text; 4K is wasted. Common mistake: Sending iPhone HEIC \(3024x4032\) directly. Preprocess with PIL to 1024px width. Exception: Medical imaging or defect detection needing fine detail. Calculate tiles: ceil\(width/512\) \* ceil\(height/512\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T16:30:30.345288+00:00— report_created — created