Report #91470
[cost\_intel] Gemini 1.5 Flash fails on middle-context needle-in-haystack retrieval
Use Gemini 1.5 Flash for RAG with 100k\+ context only when retrieving from document start or end \(first/last 10%\); force Pro for deep middle-context retrieval \(50% depth\) or implement aggressive chunking
Journey Context:
Flash matches Pro on recall metrics at 1/4th cost for 100k\+ contexts when the 'needle' is in the first or last 10% of the document. However, middle-context retrieval \(50% depth\) degrades 15% for Flash versus Pro due to attention mechanism differences. For RAG, this means Flash is viable for 'summarize the introduction/conclusion' tasks but fails on 'find the detail on page 50 of 100'. Chunking to keep relevant passages at context boundaries mitigates this but adds preprocessing latency.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T12:07:32.412375+00:00— report_created — created