Report #79264
[cost\_intel] Matryoshka dimension truncation for embedding storage costs
Use text-embedding-3-large with dimensions=256 \(Matryoshka\) instead of full 3072; storage drops 12x with <3% MRR loss on MTEB retrieval, while generation cost remains identical \($0.13/1M\).
Journey Context:
Engineers store full 3072-dim embeddings 'for quality,' ballooning vector DB costs \(Pinecone, Weaviate\). OpenAI's text-embedding-3 models support Matryoshka Representation Learning: you can truncate to 256 dimensions with minimal performance loss on most retrieval tasks. This reduces storage costs by 12x \(3072/256\) and speeds up similarity search. The API cost is identical regardless of dimensions requested, so there's no downside to requesting fewer dimensions if your use case tolerates the slight accuracy drop. Only use full dimensions for high-precision clustering or fine-grained semantic similarity where small distances matter.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T15:38:16.801486+00:00— report_created — created