Report #49621
[cost\_intel] Using 3072-dim embeddings by default; not realizing 512-dim has 6x lower cost and often better recall
Use text-embedding-3-large with dimensions=512; evaluate recall@k before defaulting to full dims
Journey Context:
OpenAI's text-embedding-3-large defaults to 3072 dimensions costing $0.13/1M tokens. But the model supports Matryoshka learning - you can truncate to 512 dimensions at inference with minimal quality loss on most retrieval tasks. 512-dim costs $0.13/1M tokens \(same input price\), but storage costs for the vector DB drop by 6x, and memory bandwidth improves. More importantly: OpenAI recently introduced tiered pricing where smaller dims aren't cheaper for API calls, but many providers \(voyage, cohere\) do charge less. The real trap: assuming higher dims = better retrieval. For cosine similarity search on short documents \(chunk size <512 tokens\), 512-dim often has higher recall@10 than 3072-dim due to overfitting in high-dim spaces with sparse data. Always benchmark with your chunk size.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T13:46:20.375086+00:00— report_created — created