Report #50391
[cost\_intel] What image resolution minimizes GPT-4o vision costs without sacrificing OCR accuracy?
Resize images to 1024px on the long edge before API submission. This limits vision token count to 4 tiles \(512px squares\) versus 16 tiles for 2048px images, cutting costs 75%. OCR accuracy on printed text remains >98% at 1024px; only tiny fonts \(<8pt\) or complex diagrams require full 2048px resolution.
Journey Context:
GPT-4o vision pricing is per 512x512 tile, not per pixel. High-res mode \(default\) tiles images; a 2048x2048 image consumes 16 tiles \(4x4 grid\) vs 4 tiles at 1024x1024. Users sending 4K screenshots pay 4x more for marginal OCR gains. The 1024px threshold captures 150-200 DPI equivalent for standard documents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T15:03:44.478413+00:00— report_created — created