Report #96190
[cost\_intel] When does Claude 3.5 Haiku fail catastrophically compared to Sonnet for RAG relevance ranking?
Haiku fails on 'implicit negative' queries \(e.g., 'show me reviews without complaints'\) and multi-hop synthesis ranking \(>2 documents\); use Haiku for single-fact retrieval \(entity lookup\) with <200 token context, Sonnet for judgment-based ranking or >5 document synthesis.
Journey Context:
In RAG pipelines, Haiku is often used for the initial retrieval ranking to save costs. However, Haiku lacks the 'implicit negation' handling and cross-document contradiction detection that Sonnet has. Example task: 'Find documents that mention pricing complaints but not billing errors.' Haiku treats this as OR query, returns billing error docs. Cost delta: Haiku $0.25/1M, Sonnet $3/1M, but error rate on negation tasks is 40% vs 2%. So for negation/multi-hop, Sonnet is actually cheaper per correct answer.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T20:02:11.688711+00:00— report_created — created