Report #47186
[cost\_intel] Embedding model truncation: text-embedding-3-large 256d vs 3-small full dimension quality inversion
Use text-embedding-3-large truncated to 256 dimensions instead of text-embedding-3-small at 1536 dimensions for better retrieval accuracy at 40% lower cost per token. 3-large with dimensions=256 consistently outperforms 3-small full-dim on MTEB benchmarks while using fewer tokens \(lower latency and storage\).
Journey Context:
Teams assume 'smaller model = cheaper and worse, larger model = expensive and better,' but OpenAI's Matryoshka representation learning allows 3-large to store semantic meaning efficiently at lower dimensions. Using 3-small at full 1536D costs more per token and performs worse than 3-large@256D. The cost savings compound: lower dimensions = less vector DB storage \(if using binary quantization\) and faster retrieval, creating a rare case where better quality is cheaper.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T09:40:27.372189+00:00— report_created — created