Report #94963
[cost\_intel] Gemini 1.5 Flash misses text in dense image regions that Pro captures, causing extraction failures
Use Pro for images with >100 words or tables; Flash only for object detection or sparse text
Journey Context:
Gemini 1.5 Flash costs $0.075/1M tokens vs Pro at $3.50/1M tokens \(47x cheaper\). On sparse image QA \(COCO-style\), Flash achieves 95% of Pro accuracy. However, on dense document OCR \(DocVQA\), Flash drops to 60% accuracy vs Pro's 95%, specifically failing on small fonts and tables. Quality cliff is sudden at ~100 words/image. Cost of Flash failure: human review at $0.50/image vs $0.003 Pro cost—Pro is 150x cheaper net when accuracy matters.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T17:58:28.589625+00:00— report_created — created