Report #69352
[cost\_intel] How does GPT-4o vision pricing scale with image resolution?
Pre-resize images to 512px on shortest side before GPT-4o Vision API calls; a 1024x1024 image costs 4x tokens \(4 tiles\) vs 512x512 \(1 tile\), saving 75% on image inputs.
Journey Context:
Teams send full-resolution screenshots to vision models unaware of the tile-based pricing. GPT-4o divides images into 512x512 tiles, charging per tile \(low detail mode uses 1 tile regardless\). A 1920x1080 screenshot incurs 8 tiles \(2048x1024 bounding box\), costing 8x more tokens than a 512px thumbnail. For document OCR and UI understanding, 512px resolution preserves text readability while slashing costs. Only use high-res for fine-grained visual detail \(medical imaging, engineering diagrams\); the 75% savings dominate standard use cases.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T22:53:37.593679+00:00— report_created — created