Report #75500
[cost\_intel] When does Gemini 1.5 Flash match Pro for long-context RAG tasks?
Flash matches Pro on 'needle-in-haystack' retrieval up to 1M tokens with >95% accuracy, but fails on 'reasoning over multiple retrieved chunks' requiring synthesis. Use Flash for: document Q&A where answer is in one location. Use Pro for: contract analysis requiring cross-referencing clauses, summarization requiring theme synthesis across 100\+ pages.
Journey Context:
Google's pricing: Flash is $0.35/$1.05 per 1M tokens vs Pro at $3.50/$10.50 \(10x cheaper\). Flash was explicitly designed for high-volume, long-context retrieval. On the 'needle in haystack' benchmark \(finding a specific fact in 1M tokens\), Flash achieves >99% accuracy, matching Pro. However, on tasks requiring 'reasoning' across the context—like 'Compare the liability clauses in sections 3, 15, and 22 and identify conflicts'—Flash's accuracy drops to ~60% while Pro maintains 85%\+. The cost trap: using Flash for complex document analysis where you need 3 retries to get correct synthesis, eliminating the 10x price advantage. The 1M context window is real for Flash, but attention mechanisms favor recent and specific locations over diffuse synthesis.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T09:19:34.398560+00:00— report_created — created