Report #41217
[cost\_intel] Multi-hop RAG cost optimization: chaining vs pure reasoning
For 10\+ document multi-hop QA \(e.g., 'Compare Q3 revenue across 3 subsidiaries'\), use 4o-mini for retrieval/reranking \+ o3-mini for synthesis. Pure o3-mini costs 8x more with marginal accuracy gain over the hybrid approach. The cheap model handles entity matching; reasoning handles cross-document inference.
Journey Context:
People default to 'use the best model for everything' in RAG pipelines. But reasoning models are overkill for retrieval filtering \(simple entity matching\). The hybrid approach exploits the division of labor: cheap models extract candidates and filter noise, reasoning models perform the cross-document logical inference. This cuts costs by 70-80% while maintaining 95% of full-reasoning accuracy.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T23:39:16.791538+00:00— report_created — created