Report #73862
[cost\_intel] Vision API tile-based charging causing 10x cost variance for same pixel count
Pre-resize all images to exactly match model tile boundaries before API call: GPT-4V low-res = 512x512 single tile; GPT-4V high-res = 512px squares with 2x scale-up; Claude 3 = 768px squares; avoid 'auto' detail setting which forces 2x2 tiles on images >512px
Journey Context:
Vision models don't charge by pixel or file size—they charge by 'tiles' \(fixed-size squares\). An 800x800 image gets upscaled to 1024x1024 \(4 tiles\) while a 1024x512 image \(same pixels\) uses only 2 tiles. The 'auto' setting in GPT-4V forces high-res mode \(2x2 tiles minimum\) for any image over 512px, quadrupling cost unnecessarily. Claude 3 uses 768px tiles with 6px overlap—an 800px wide image crosses into a second tile column, doubling cost vs a 768px image. Exact dimension matching is required for cost control.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T06:34:30.940732+00:00— report_created — created