Report #87430
[cost\_intel] When does Gemini 1.5 Flash match Pro on long-context RAG vs failing on multi-hop reasoning
Use Gemini 1.5 Flash for single-hop retrieval from documents <200k tokens where the answer resides in a contiguous passage. Flash costs $0.35/1M tokens vs Pro at $7.00/1M \(20x savings\). Switch to Pro when the query requires synthesizing evidence from >3 disparate sections \(multi-hop reasoning\) or when the context exceeds 500k tokens with needle-in-haystack retrieval.
Journey Context:
Google's pricing model creates a massive arbitrage opportunity: Flash is 20x cheaper but shares the same 1M\+ context window. The quality cliff appears on 'connective' reasoning—Flash retrieves facts accurately but fails to connect causality across distant text segments \(e.g., 'Given the contract terms in Section 3 and the amendment in Appendix B, what is the liability cap?'\). Quality degradation signature: Flash shows sharp accuracy decay on 'needle in haystack' tasks when the needle is >100k tokens from the query, while Pro maintains >90% accuracy to 500k\+ tokens.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T05:20:30.526065+00:00— report_created — created