Report #39162
[cost\_intel] Assuming GPT-4o Vision 'high' detail is necessary for all document OCR
Set detail: 'low' for GPT-4o Vision when processing documents with text size >12pt and no fine-grained spatial reasoning \(e.g., 'extract all text' vs 'read this 6pt serial number'\). Low detail costs a fixed 1,000 tokens per image vs High detail costing 85 tokens per 512x512 tile; a 1080p image costs ~3,400 tokens in high detail vs 1,000 in low \(3.4x savings\). At 10k images/day, this reduces cost from $170 to $50.
Journey Context:
Default SDK settings use 'auto' which selects 'high' for document images. The cost signature is massive for bulk OCR. The quality signature is predictable: low detail downsamples images to 512x512, rendering text <8pt illegible and collapsing tables with <5px borders. Hard rule: if you're not reading microtext or doing visual QA, use low. Implementation detail: 'low' detail explicitly sets image to fixed 512x512 resolution before encoding.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T20:12:27.155644+00:00— report_created — created