Report #77219
[cost\_intel] GPT-4o Vision high-res image cost explosion
Pre-resize images to 512px on short edge before API call; costs scale by 512x512 tiles \(4x cost for 1024x1024\)
Journey Context:
Vision pricing is per 512x512 tile, not linear. A 1024x1024 image consumes 4 tiles \(2x2\), costing 4x tokens vs 512x512. A 1920x1080 screenshot costs 8 tiles. Teams upload 4K screenshots for 'better OCR' but pay 16x-64x more for marginal accuracy gains on text. Resize to 512px short edge maintains OCR quality while cutting costs 75-90%.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T12:12:20.564837+00:00— report_created — created