Report #40726
[cost\_intel] Using high-resolution vision mode for text-dense document OCR
Use 'detail: low' \(OpenAI\) or standard resolution \(Anthropic\) for text-dense documents under 1000 words; costs drop by 85% \(85 tokens vs 1000\+ tokens for high-res tiling\). Reserve high\_res for fine-grain visual reasoning \(charts, small fonts <8pt, spatial relationships between non-text elements\).
Journey Context:
Vision APIs tile high-res images into 512px squares, charging per tile. A 2000x2000px image = 16 tiles = 1000\+ tokens \($0.015 at GPT-4o rates\). For OCR of standard documents, low\_res \(512px single view, 85 tokens\) suffices and costs $0.0013. Mistake: assuming 'detail' is needed for text accuracy. Quality degradation on standard text OCR from low\_res is <1% for fonts >10pt, but cost is 12x higher for high\_res. At 100k images/month, $1,500 vs $130.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T22:49:54.055344+00:00— report_created — created