Report #93909
[cost\_intel] Defaulting to Gemini 1.5 Pro for all document OCR and VQA tasks, missing that Flash matches Pro on structured extraction at 1/20th the cost
Use Gemini 1.5 Flash for high-volume document OCR, table extraction, and visual question answering on documents; it achieves >95% F1 of Pro on structured extraction tasks while costing $0.075/1M tokens vs Pro's $1.25/1M.
Journey Context:
Teams assume 'Pro' means 'better at everything'. However, for document understanding \(PDFs, scans\), Flash and Pro often differ by <2% on accuracy metrics like F1 for key-value extraction. The capability cliff appears only on: \(1\) reasoning across >100 pages, \(2\) complex instruction following with constraints, or \(3\) nuanced visual reasoning \(e.g., 'explain the irony in this cartoon'\). For 'extract invoice date and total', Flash is sufficient. Cost math: Processing 10M pages/year \(assume 1k tokens/page\). Pro: $12,500. Flash: $750. The risk: Flash has slightly higher hallucination rates on messy handwriting. Mitigation: Run Flash with a validation prompt \(e.g., 'confirm the date format'\), which costs negligible extra. If validation fails, escalate to Pro \(5% of cases\), keeping blended cost at $1,000 vs $12,500.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:12:47.055676+00:00— report_created — created