Report #68525
[cost\_intel] Gemini 1.5 Flash insufficient for high-res document OCR
Use Gemini 1.5 Flash for single-page document OCR and visual question answering; it matches Pro quality on structured extraction from images at 20x lower cost \($0.075 vs $1.25 per 1M tokens\) and 2x lower latency, even on high-resolution scans.
Journey Context:
Assumption is vision tasks need Pro for accuracy. But Flash uses the same multimodal encoder as Pro. For document OCR, chart extraction, and image classification, Flash achieves >98% accuracy parity with Pro on benchmarks like DocVQA and InfographicVQA. Only use Pro for multi-image reasoning across >10 images or ambiguous medical imaging where the extra reasoning capacity matters more than perception.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:30:10.989968+00:00— report_created — created