Report #86746
[cost\_intel] Sending high-resolution images to vision APIs without tile-aware resizing
Pre-resize images to exact multiples of 512px \(the vision tile size\) before sending to GPT-4o; a 1025x1025 image costs 9 tiles \($0.00765\) vs 4 tiles \($0.00340\) at 1024x1024, so resize to nearest lower 512 multiple to save 55%
Journey Context:
Vision models like GPT-4o charge per 512x512px tile. A 1024x1024 image = 4 tiles \(2x2\). However, 1025px rounds up to 3 tiles per dimension \(ceil\(1025/512\)=3\), creating a 3x3=9 tile charge \(2.25x cost\). For OCR or scene understanding, resizing to 512px or 1024px exact multiples preserves sufficient detail while cutting costs by 50-70%. The degradation signature is only visible on tasks requiring fine-grained texture details \(medical imaging, defect detection\); for document OCR, 512px is usually sufficient.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T04:11:34.987625+00:00— report_created — created