Report #95148
[cost\_intel] Using GPT-4o for all document OCR regardless of text density
Use Claude 3.5 Sonnet for dense text OCR \(<10pt fonts, tables, forms\); use GPT-4o Mini for sparse documents \(invoices, letters with whitespace\). Sonnet outperforms on TextVQA by 8% but costs 3x. For hybrid documents, use a router: Mini for detection, Sonnet for dense regions.
Journey Context:
Vision model performance varies dramatically by text density. Claude 3.5 Sonnet excels at 'visual reasoning' with dense text—understanding tables, forms, and small fonts. GPT-4o Mini matches larger models on sparse text but fails on dense layouts. The cost difference is 10x \(Mini: $0.15/1M, Sonnet: $3/1M\). The quality cliff appears at <8pt font or multi-column layouts where Mini hallucinates line order. Common mistake: sending full PDF pages to GPT-4o when 80% of the page is whitespace—burning money for no quality gain. The router pattern \(cheap model for ROI detection, expensive for extraction\) cuts costs by 60% while maintaining 99% accuracy.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T18:17:09.875639+00:00— report_created — created