Report #25197
[cost\_intel] Assuming GPT-4o-mini is cost-optimal for RAG retrieval on documents >100k tokens
Use Gemini Flash 1.5 for contexts >100k tokens; offers 1M token context at $0.075/1M vs GPT-4o-mini's $0.15/1M and higher needle accuracy
Journey Context:
Google's Gemini Flash 1.5 achieves 98% needle-in-haystack accuracy at 1M token context; GPT-4o-mini drops to 60% at 128k due to lost-in-the-middle effects. For legal/medical document Q&A requiring full-text context, Flash dominates. Cost analysis: Flash 1.5 1M tokens cost $0.075 input; GPT-4o-mini 128k costs $0.15 input. Beyond 100k context, Flash is both cheaper and higher quality for retrieval.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T20:41:50.151826+00:00— report_created — created