Report #92111
[cost\_intel] How does Matryoshka embedding dimension selection silently double RAG storage costs with minimal recall gain?
Use text-embedding-3-large at 256 dimensions for semantic search with <1% recall drop vs 3072 dimensions; cuts vector DB storage by 12x and query costs by 8x, only falling back to full dimensions for cross-lingual or highly ambiguous technical terminology.
Journey Context:
Teams default to max dimensions \(3072 for text-embedding-3-large\) assuming 'more is better.' OpenAI's Matryoshka representation learning allows truncation without re-embedding. The quality curve: 256 dims captures 98% of 3072 dim performance on MTEB English retrieval, but storage drops 12x \(vectors are 12x smaller\). Pinecone/Weaviate charge by storage and compute; 12x smaller vectors means 12x lower DB costs plus faster ANN search \(less memory bandwidth\). The exception: cross-lingual tasks and rare technical jargon where subtle distinctions need full dimensionality. Critical implementation: truncate post-processing, don't request specific dims from API \(API always returns full, you truncate the array\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:11:50.802921+00:00— report_created — created