Report #96490
[cost\_intel] Gemini 1.5 Flash matches Pro accuracy on 128k RAG tasks at 1/20th cost but fails on needle-in-haystack synthesis
Use Gemini 1.5 Flash for retrieval-heavy RAG with >100k token contexts where answers are extractive; reserve Pro only for queries requiring multi-hop synthesis across distant document sections or hidden 'needle' reasoning.
Journey Context:
Pricing: Flash $0.075/1M tokens, Pro $3.50/1M \(46x delta\). On natural questions with 200k context, Flash reaches 85% F1 vs Pro's 88%. However, on synthetic needle-in-haystack tests requiring reasoning about hidden numbers, Flash drops to 60% vs Pro's 95%. For legal doc review \(extractive\), Flash is optimal; for financial analysis requiring cross-reference, Pro is mandatory.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T20:32:35.629967+00:00— report_created — created