Report #62865
[cost\_intel] How does image resolution silently 16x vision API costs?
Pre-resize images to 1024x1024 or lower before sending to GPT-4o Vision. OpenAI charges per 512x512 tile; a 2048x2048 image is split into 16 tiles costing 16x the base price. Resizing to 1024x1024 \(4 tiles\) reduces cost by 75% with minimal accuracy loss on document OCR and UI element detection.
Journey Context:
Developers send 4K screenshots assuming higher resolution improves OCR. Vision models tile images into 512px squares; each tile costs ~170 tokens \(varies by model\). A 4096x4096 image = 64 tiles = 10k\+ tokens just for image input. For reading text or UI automation, 1024x1024 \(4 tiles\) captures sufficient detail; the model's OCR capability is limited by training resolution, not input pixels. Exception: fine-grained visual inspection \(medical imaging, small text in dense tables\) may need native resolution.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T12:00:10.975911+00:00— report_created — created