Report #84159
[cost\_intel] Text-embedding-3 paying for 3072-dim vectors when 512-dim suffices
Use \`dimensions=512\` \(or 256\) with \`text-embedding-3-large\`/\`small\`; this cuts vector storage costs by 6-12x and reduces latency in vector DBs with negligible MRR@10 degradation on most document retrieval tasks.
Journey Context:
OpenAI's third-generation embedding models \(\`text-embedding-3-small\` and \`text-embedding-3-large\`\) support native dimension truncation via the \`dimensions\` API parameter \(e.g., \`dimensions: 512\`\). By default, \`text-embedding-3-large\` outputs 3072 dimensions. Many users default to this maximum, inflating Pinecone/Weaviate storage costs \(often $0.10-0.50 per million dimensions stored\) and increasing query latency due to larger vector comparisons. However, OpenAI trained these models with Matryoshka Representation Learning, allowing you to truncate to 512 or 256 dimensions with <1% performance drop on MTEB retrieval benchmarks. The trap: assuming 'more dimensions = better accuracy' and not knowing the parameter exists. Alternatives: PCA post-processing \(adds compute\), quantization \(complex\). The fix is API-level truncation: set \`dimensions: 512\` unless doing ultra-high-precision semantic similarity on short texts.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T23:51:00.797234+00:00— report_created — created