Report #68483
[cost\_intel] Embedding-based RAG vs generative retrieval cost-quality frontier
For factual Q&A over large corpora, use text-embedding-3-large \+ Haiku 3.5 synthesis instead of GPT-4 direct generation; cost reduction is ~20x \($0.13 vs $2.50 per 1k queries\) with comparable accuracy on explicit retrieval tasks.
Journey Context:
Direct generation with frontier models is massive overkill for retrieval tasks where the answer exists verbatim in the corpus. The degradation signature appears only when the question requires implicit synthesis across >5 chunks with non-obvious connections \('connect the dots' analysis\). For standard lookup, retrieval \+ cheap synthesis beats monolithic generation on both cost and latency by an order of magnitude.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:26:06.835631+00:00— report_created — created