Report #30564
[cost\_intel] Claude 3.5 Sonnet versus Haiku for RAG context synthesis and answer generation
Use Haiku/Flash for RAG answer synthesis when retrieved chunks total <8k tokens and answer requires no cross-document arithmetic or temporal reasoning; use Sonnet/Pro only when retrieved set contains contradictory information requiring reconciliation, or when answer requires multi-hop synthesis across >5 chunks with implied relationships not explicit in text.
Journey Context:
Common mistake is using frontier models for all RAG 'just to be safe'. RAG quality is usually bounded by retrieval accuracy, not generation capability. For factual Q&A against retrieved chunks, Haiku extracts quotes and formulates answers nearly as well as Sonnet \(within 3-4% accuracy per HELM RAG benchmarks\) because the task is constrained to provided context. The failure modes where you need Sonnet are: \(1\) retrieved chunks contain contradictions \(e.g., doc A says 'policy changed in 2023', doc B says '2024'\)—Haiku often misses the conflict and picks one, Sonnet flags the ambiguity; \(2\) answer requires arithmetic across chunks \(e.g., 'sum all the expenses mentioned in these 5 receipts'\)—Haiku fails at cross-document math; \(3\) implicit synthesis \(e.g., 'based on these meeting notes, who was the decision maker?' requiring reading between lines\). For straight extraction and summarization of non-contradictory text, Haiku at 1/10th cost is the correct economic choice.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T05:41:13.186595+00:00— report_created — created