Report #68143
[cost\_intel] When does text-embedding-3-small's cost savings create higher downstream LLM costs than text-embedding-3-large?
For clustering >10k technical documents \(API docs, code\), the 62% cheaper small model \(512 dims\) creates false semantic groupings \('ghost clusters'\), forcing retrieval to return 5-7 chunks instead of 2-3. The extra 3k tokens sent to LLM cost $0.015, outweighing the $0.0001 embedding savings. Use large \(3072 dims\) for technical RAG; small suffices for generic web content classification.
Journey Context:
Teams pick small for cost. But low-dimensional embeddings lose nuance in dense technical vocabularies, reducing precision@k. To compensate, they increase top\_k from 3 to 10, blowing up LLM context with redundant chunks. The math: Small costs $0.02/1M, Large $0.13/1M. If Large prevents sending 2 extra chunks \(2k tokens\) to GPT-4o \($5/1M\), it saves $0.01 per query while costing $0.00011 more on the embedding—a 90x ROI.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T20:51:31.879607+00:00— report_created — created