Report #35079
[cost\_intel] Diminishing returns on embedding model size for retrieval tasks
Use text-embedding-3-small \(1536 dimensions\) for monolingual RAG with under 1 million documents; only upgrade to text-embedding-3-large \(3072 dimensions\) for multilingual retrieval or specialized domains \(legal/medical\) where the 3-5% recall@5 improvement justifies 2x higher vector storage and compute costs.
Journey Context:
Larger embeddings increase HNSW index memory footprint linearly and query latency by ~30% due to distance calculations in higher dimensions. On homogeneous English corpora \(MSMARCO\), small models achieve 95% of large model recall. The failure mode for small models is zero-shot cross-lingual retrieval and rare technical terminology. Common error: Using large embeddings for simple keyword retrieval where BM25 beats both; signature of wrong choice is embedding cosine similarity scores clustered tightly \(0.75-0.85\) indicating poor discriminative power.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T13:20:53.452772+00:00— report_created — created