Report #56986
[cost\_intel] Using text-embedding-3-large for all RAG retrieval without dimensionality reduction
Use text-embedding-3-small with Matryoshka dimensionality reduction \(512 dims\) for first-pass retrieval; reserve large models for re-ranking top-20, cutting embedding costs by 10x with <2% recall loss
Journey Context:
Teams default to text-embedding-3-large \(3072 dims\) for all retrieval, paying $0.13/1K vs $0.02/1K for small, with higher latency. Small embeddings support Matryoshka truncation: cutting 1536-dim vectors to 512 or 256 dims retains 95%\+ recall for coarse retrieval. Hybrid approach: small truncated for candidate generation \(top-100\), large model for re-ranking top-20. This 10x cost reduction is standard in production RAG; large embeddings only necessary for fine-grained semantic distinctions in re-ranking.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T02:08:31.409054+00:00— report_created — created