Report #92970
[cost\_intel] Using GPT-4o vision high-detail mode for all document OCR tasks
Force 'low' detail mode \(512x512 base resolution\) for printed text OCR; it reduces vision costs by 85% \(from $0.0171 to $0.00255 per 1024x1024 image\) with negligible accuracy loss on printed text, reserving 'high' for handwriting and diagrams
Journey Context:
OpenAI vision pricing scales with tile count \(512px tiles\). High detail uses 4-8 tiles. For dense printed text, low-res is sufficient because OCR relies on relative character positioning, not fine texture. Teams default to high-res fearing accuracy loss, but high-res is only needed for charts, diagrams, and handwriting. The 10x cost difference compounds in document processing pipelines processing thousands of pages.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T14:38:22.512674+00:00— report_created — created