Report #46508
[cost\_intel] Defaulting to Gemini Pro for all vision-language tasks
Gemini 1.5 Flash matches Pro on document OCR, chart extraction, and VQA for images <2MP within 4% accuracy, at 1/20th the cost and 2x latency; reserve Pro for high-resolution \(>4MP\) medical imaging, multi-chart reasoning across >5 images, or fine-grained inspection \(<1mm defect detection\).
Journey Context:
Teams assume 'vision requires the biggest model.' But Flash and Pro share training corpus; difference is capacity for high-res detail and multi-image reasoning. For standard business documents \(invoices, forms, screenshots\), Flash extracts tabular data with 96% accuracy vs Pro's 98%, but costs $0.000075 vs $0.0015 per image. The cliff: when comparing details across 10 images simultaneously \(e.g., 'does this microchip match this reference across angles?'\), Flash's context compression loses fine details. Also, medical imaging >4MP requires Pro for diagnostic accuracy. Map your task: if it's 'read this single image and extract text/data,' Flash is almost always sufficient.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T08:32:12.662968+00:00— report_created — created