Report #42834
[cost\_intel] Using reasoning models instead of long-context instruct models for RAG
Use Gemini 1.5 Pro \(2M context\) or Claude 3 Opus for single-document QA; use reasoning only for synthesis across >5 documents requiring causal inference
Journey Context:
Gemini 1.5 Pro's 2M token window allows full document ingestion at $3.50/1M tokens with 99% needle-in-haystack accuracy. o1-preview limited to 128k and $60/1M. Reasoning models only win when answer requires connecting evidence from 10\+ disparate sections \(causal chains\). Signature: if answer is verbatim extractable, use long-context instruct.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T02:21:49.857775+00:00— report_created — created