Report #55893
[cost\_intel] When does retrieval-augmented generation require reasoning models over cheap instruct models?
Use reasoning models \(o1/o3\) for multi-hop RAG requiring synthesis of >3 contradictory documents or temporal reasoning; use GPT-4o-mini \+ re-ranking for single-hop or factual lookup queries.
Journey Context:
In single-hop RAG \(answer contained in top-1 chunk\), reasoning models add 20x cost and 10x latency for <2% accuracy gain, often hallucinating 'connections' where none exist. The crossover happens at 3\+ hops: when the answer requires resolving contradictions between Document A \(2023 data\) and Document B \(2024 update\) or calculating derived values across tables. The degradation signature for cheap models is 'retrieval failure' \(missing the second hop\), while reasoning models maintain coherent chains across >5 sources.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T00:18:33.862265+00:00— report_created — created