Report #78330
[cost\_intel] Using Gemini 1.5 Flash for 128k context RAG when retrieval accuracy drops 20% vs Pro on complex multi-document synthesis
Use Gemini 1.5 Pro for RAG contexts >64k tokens requiring synthesis across 5\+ documents; Flash matches Pro on single-document retrieval but shows 15-20% degradation on multi-hop cross-document reasoning
Journey Context:
Flash uses a sparse MoE architecture that skips expert layers for speed. For needle-in-haystack \(single fact in 128k context\), Flash matches Pro. For 'compare the methodology in paper A with paper B and contrast with paper C,' Flash's compressed reasoning path drops connections. Cost delta is 5x \(Flash $0.35 vs Pro $3.50 per 1M tokens at 128k context\). Common error: evaluating Flash on simple retrieval benchmarks and assuming it scales to complex synthesis.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T14:04:22.399041+00:00— report_created — created