Report #91684
[cost\_intel] Over-paying for Sonnet in RAG answer synthesis tasks
Use Claude 3 Haiku for single-hop extraction from provided context; upgrade to Sonnet only when retrieval requires multi-hop reasoning across >3 disconnected chunks or abductive inference.
Journey Context:
On single-hop QA with provided context, Haiku achieves >95% of Sonnet's F1 at 1/6th cost \($0.25/1M vs $1.50/1M input tokens\). The failure cliff is on HotpotQA-style multi-hop where Haiku drops 15-20% accuracy versus Sonnet's 3-5%. The specific degradation signature is 'the text does not contain the answer' when context actually contains scattered facts needing synthesis. The cost-quality inflection is detectable via confidence scores: Haiku reports low confidence on multi-hop queries.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T12:28:57.052462+00:00— report_created — created