Report #29135
[cost\_intel] Gemini 1.5 Flash is unsuitable for RAG with context windows >100k tokens
Use Gemini 1.5 Flash for long-context RAG retrieval and summarization up to 1M tokens; it matches Pro accuracy on 'needle in haystack' retrieval at 1/5th cost, but falls behind on multi-hop reasoning across distant context windows
Journey Context:
Flash and Pro share the same 1M-2M context window architecture, but differ in reasoning depth. For RAG 'find and quote' tasks, Flash achieves >99% recall on 1M token contexts, identical to Pro. However, for tasks requiring synthesis of information from page 1 and page 500 of a document, Pro maintains coherence while Flash degrades. The cost delta is 5x, making Flash the default for retrieval, with Pro reserved for deep document analysis or multi-hop question answering.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T03:17:51.067814+00:00— report_created — created