Report #84387
[cost\_intel] Vision API token bloat in high-res mode for document OCR
Force 'low\_res' \(512px\) mode for printed text documents with >12pt font; use 'high\_res' only for fine details \(signatures, stamps, microtext\). Low-res costs 85 tokens \($0.000425 at $5/1M\) vs high-res at 170 tokens per 512px tile—e.g., a 2048x2048 image costs $0.0136 \(16 tiles\) in high-res vs $0.0005 in low-res, a 27x difference.
Journey Context:
GPT-4o vision pricing scales with image tiles. Low-res mode resizes the longest side to 512px and costs 85 tokens. High-res mode keeps the image, scales shortest side to 2048px, then tiles into 512px squares at 170 tokens each. A standard 8.5x11" PDF page at 300 DPI is ~2550x3300 pixels. In high-res, this becomes multiple tiles \(approx 20-30 tiles\) = 3400-5100 tokens = $0.017-$0.025 per page. For OCR of standard documents, low-res 512px captures all text >8pt font clearly. The quality cliff is only on microprint, handwritten annotations, or complex diagrams. Teams processing thousands of pages/day waste thousands of dollars using default high-res.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T00:14:02.856142+00:00— report_created — created