Report #88242
[cost\_intel] Gemini 1.5 Pro used for simple vision tasks
Use Gemini 1.5 Flash for single-image visual QA, OCR, and basic description; achieves >95% of Pro accuracy at 1/20th the cost \($0.075 vs $1.25 per 1M image tokens\). Reserve Pro for multi-image reasoning, video analysis, or >1M token contexts.
Journey Context:
Flash matches Pro on single-frame document OCR, object counting, and captioning. The quality cliff appears at cross-image reasoning \(e.g., 'compare the chart in image 1 with image 2'\) or temporal video understanding. For document processing pipelines handling 100k pages/day, Flash reduces vision costs from $125/day to $7.50/day with <2% accuracy drop on extraction tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T06:41:51.433168+00:00— report_created — created