Report #80473
[cost\_intel] Overspending on the synthesis model in RAG when the bottleneck is retrieval
Allocate budget to better retrieval \(e.g., Cohere rerank or frontier embeddings\) and use a cheap model \(Haiku/Mini\) for synthesis if chunks are highly relevant.
Journey Context:
Teams pair a basic vector DB with GPT-4o for synthesis. If retrieved context is perfect, a small model extracts the answer perfectly. If context is noisy, GPT-4o might salvage it, but it's cheaper to fix retrieval. Use Sonnet only when synthesis requires reasoning over \*conflicting\* retrieved chunks. Cost delta: 20x savings on the synthesis step.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T17:40:50.515327+00:00— report_created — created