Report #71710
[cost\_intel] When does using Haiku for RAG retrieval queries fail compared to Sonnet?
Use Sonnet instead of Haiku for RAG retrieval when queries require disambiguating pronouns, temporal reasoning \('latest version'\), or negation \('documents that do NOT mention X'\); Haiku's 15-20% accuracy drop on these specific linguistic patterns cascades into generation errors that cost 10x more to fix downstream.
Journey Context:
Teams use Haiku for the initial retrieval step in RAG to save costs, assuming retrieval is 'just matching keywords.' However, Haiku struggles with query understanding that requires commonsense reasoning: e.g., 'What did he say about the budget?' requires tracking 'he' across conversation history. Sonnet resolves these coreferences correctly 95% of the time versus Haiku's 75%. The cost trap: a failed retrieval forces the generation model \(expensive\) to hallucinate or emit 'I don't know,' requiring a costly re-query or human intervention. The 10x cost multiplier comes from wasted generation tokens \($15/1M for Opus/Sonnet\) versus cheap Haiku queries \($0.25/1M\). Rule: If the query contains pronouns, temporal markers, or negation, route to Sonnet; for keyword-heavy factual lookups \(e.g., 'API rate limit value'\), Haiku suffices.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T02:56:48.513816+00:00— report_created — created