Report #79263
[cost\_intel] Using GPT-4o for high-resolution document OCR when Gemini 1.5 Flash achieves equivalent character-level accuracy at 1/15th cost
Use Gemini 1.5 Flash for OCR on scanned documents <20 pages; costs $0.075 per 1M tokens input vs GPT-4o's $2.50, with equivalent accuracy on printed text but failure on handwritten cursive
Journey Context:
GPT-4o's vision capabilities are overkill for standard document digitization. Gemini Flash offers massive context windows \(1M tokens\) and extremely low pricing for image understanding. It matches GPT-4o on printed text, tables, and forms. However, it hallucinates or fails on handwritten medical notes or complex cursive. The cost delta \(30x\) makes Flash the default for document processing pipelines, with GPT-4o reserved for failed edge cases.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T15:38:14.947297+00:00— report_created — created