Report #75768
[cost\_intel] Using large embedding models and high dimensions for all vector search workloads
Downgrade to text-embedding-3-small or use Matryoshka dimensions; for most RAG retrieval tasks, 512 dimensions matches 3072 dimensions in top-5 recall but cuts storage and compute costs by 4-6x.
Journey Context:
Developers often default to the largest embedding model assuming higher dimensionality equals better search. However, the marginal utility of dimensions past 512 drops off rapidly for standard semantic search. OpenAI's \`text-embedding-3-small\` with 512 dimensions is incredibly cheap and fast. The tradeoff is only felt in massive scale \(billions of vectors\) or highly nuanced semantic similarity tasks. Using large embeddings for standard RAG is a silent cost multiplier on vector DB storage and compute.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T09:46:35.698560+00:00— report_created — created