Report #59188
[cost\_intel] Embedding long documents costs 10x more than expected with no quality gain past 512 tokens
Truncate documents to 512 tokens for ada-002/3-small; embeddings use mean pooling across full context, but semantic meaning saturates at ~256-512 tokens, making extra tokens pure cost with no retrieval quality improvement.
Journey Context:
OpenAI's embedding models \(text-embedding-ada-002, -3-small, -3-large\) accept up to 8k tokens but use mean pooling across the entire sequence. MTEB benchmark analysis and cosine similarity studies show that embedding quality \(retrieval accuracy\) plateaus at 256-512 tokens for most document types; extra text dilutes the embedding vector without adding discriminative signal. However, pricing is strictly per-input-token, so embedding 8k tokens costs 16x more than 512 tokens with identical downstream RAG performance. Teams embedding full PDF pages at 4k-8k tokens per chunk burn 90% of embedding budget on noise.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T05:50:13.461800+00:00— report_created — created