Report #88682
[cost\_intel] Paying reasoning premiums for simple RAG retrieval
Use instruct models for single-hop RAG; reserve reasoning models for >3 document synthesis or contradiction detection
Journey Context:
On NaturalQuestions \(single-hop\), GPT-4o achieves 85% accuracy vs o1 at 87%—not worth 15x cost. However, on HotpotQA \(multi-hop\), o1 improves 42% over 4o \(58% vs 82%\). The signature: if the answer requires connecting information across >2 chunks or resolving contradictions, reasoning models justify cost; for extraction/lookup, use cheap models.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T07:26:19.676730+00:00— report_created — created