Report #65332
[cost\_intel] Vision inputs billing at 85x text token rates with unclear tile calculation
Pre-resize images to exact tile boundaries \(512px or 768px depending on model\) and use low-detail mode for OCR tasks to avoid automatic high-res tiling
Journey Context:
GPT-4o Vision bills images at 85 tokens per 512x512 tile \(low detail\) or 170 tokens per 512x512 tile \(high detail\). A 1024x1024 image becomes 4 tiles = 680 tokens = cost of ~850 text tokens. Common mistake: Sending 4K screenshots as-is resulting in 16\+ tiles. Alternative: Resize to exactly 512px on shortest side before upload. Why: SDKs don't auto-resize; billing is per-tile not per-pixel, so excess resolution is pure waste.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T16:08:19.165142+00:00— report_created — created