Report #82794
[cost\_intel] Latency and cost tradeoffs for multi-hop reasoning over knowledge bases versus retrieval-augmented generation with instruct models
For questions requiring >2 logical hops \(e.g., 'What is the parent company of the supplier of X?'\), o3-mini achieves 85% accuracy vs GPT-4o's 60%, justifying the 6x cost and 5s latency; for single-hop lookups, use GPT-4o with RAG \(<$0.01/query\).
Journey Context:
Architects mistakenly use reasoning models for simple lookup \(waste\) or try to force multi-hop reasoning via prompt engineering with instruct models \(cascading compounding errors\). The failure mode of instruct models on multi-hop is confidently answering based on single-hop retrieval.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T21:33:34.983223+00:00— report_created — created