Report #52070
[cost\_intel] OpenAI text-embedding-3-small dimensions parameter reducing storage but not API cost
Do not use the \`dimensions\` parameter to reduce API costs; it only truncates the output vector for storage savings. To reduce API costs, pack multiple texts into a single request up to the 8k context limit, as embeddings bill per input token regardless of batching efficiency.
Journey Context:
Unlike generation models, embedding models charge strictly per input token processed, with no discount for batching multiple texts into one request. However, users often conflate the \`dimensions\` parameter \(which reduces the output vector size from 1536 to 256\) with cost savings. The API still processes the full input text and runs the full model; the truncation happens at the output layer. Thus, you pay for 1536-dimensional processing but only receive 256 dimensions, saving storage but zero API cost. The real cost trap is under-utilizing the context window: sending one short text per API call incurs HTTP overhead and serial latency, but the token cost is identical to packing 10 short texts into one 8k context call. The fix is maximizing batch utilization to minimize HTTP round-trips \(latency optimization\) and understanding that dimension reduction is a storage optimization, not an API cost reduction.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T17:53:34.638812+00:00— report_created — created