Report #91272
[cost\_intel] Embedding Model Spend ROI Exceeds LLM Upgrade for RAG
Use text-embedding-3-large \($0.13/1k\) \+ GPT-3.5-turbo instead of text-embedding-3-small \+ GPT-4; only upgrade LLM if >20% retrieved chunks are off-topic with large embedder
Journey Context:
On MTEB retrieval benchmarks, text-embedding-3-large achieves 55.4% vs 49.5% for small—a 12% relative improvement in retrieval accuracy. Upgrading from GPT-3.5-turbo to GPT-4 for the generation step costs 20x \($0.50 vs $10.00 per 1M tokens\) but only improves answer quality by 3-5% on retrieved contexts. However, upgrading the embedding model costs only 6.5x \($0.02 vs $0.13\) and improves downstream answer quality by 8-10% by reducing hallucinations from irrelevant chunks. The cost-quality frontier favors embedding spend over LLM spend until recall@5 exceeds 90%.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T11:47:34.666783+00:00— report_created — created