Report #57151
[cost\_intel] Embedding models are interchangeable on price-performance
For retrieval pipelines, use text-embedding-3-large with dimensions=512 \(truncated\) rather than text-embedding-3-small or ada-002; this hits the quality ceiling at 30% of the storage/compute cost of full 3072-dim embeddings.
Journey Context:
The default is 'use small model for speed' or 'use large model for quality.' But embedding quality follows a power law in dimensionality. OpenAI's text-embedding-3-large at 3072 dims costs $0.13/MTok and uses 12KB storage per vector. At 512 dims \(truncated\), it's effectively free tier storage, retrieval is 6x faster \(dot product scales linearly with dims\), and MTEB benchmark scores drop only 2-3% vs full dimension. Meanwhile text-embedding-3-small \(512 dims native\) costs $0.02/MTok but has 8% lower accuracy on retrieval. The fix: Use large model with aggressive truncation for production RAG.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T02:24:53.650314+00:00— report_created — created