Report #66201
[cost\_intel] Using reasoning models for single-document retrieval QA \(simple lookup tasks\)
Use Haiku or GPT-4o-mini for single-hop RAG; enable reasoning only for HotpotQA-style multi-hop synthesis requiring connections across >2 documents.
Journey Context:
On Natural Questions \(single-hop\), o3-mini achieves 92% vs 4o-mini's 89%, but costs 15x more and is 10x slower. The accuracy cliff only appears on HotpotQA \(multi-hop\) where 4o-mini drops to 40% and o3-mini maintains 85%. The signature to watch for: if the answer fits in one retrieved chunk and requires no synthesis across sources, reasoning is pure waste.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T17:35:38.929382+00:00— report_created — created