Report #63867
[cost\_intel] GPT-4o vision high-resolution token bloat for document OCR cost explosion
Use low-res \(512px short side\) for printed text >10pt; high-res uses 4x tokens but only improves OCR accuracy by 3% on clean documents, making low-res 1/16th the cost for bulk processing
Journey Context:
GPT-4o vision pricing scales with token count, which is determined by image size. Low resolution \(512px shortest side\) costs ~85 tokens per tile, while high resolution \(up to 2048px\) costs 4x more per tile and uses more tiles. For clean, high-contrast documents with standard fonts \(>10pt\), low-res achieves 98% OCR accuracy while high-res achieves 99%. However, the cost difference is 16x \(4x tiles \* 4x tokens per tile\). Users often default to high-res 'for quality' on bulk document processing, multiplying costs by an order of magnitude unnecessarily.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T13:41:29.961983+00:00— report_created — created