Report #26601
[cost\_intel] Sending embedding requests one-by-one costs 10x more than batching due to per-request overhead
Batch up to 96-2048 texts per request \(depending on provider limits\); use openai.embeddings\_utils for automatic batching; monitor for 429 errors on large batches
Journey Context:
Embedding pricing is $/1K tokens, but API costs often include a per-request overhead. Sending 1000 requests with 1 token each vs 1 request with 1000 tokens: the former hits rate limits, incurs connection overhead, and on some providers \(historically\), charged minimums per request. OpenAI allows batching up to 96 input texts per request \(as of API version\), while Cohere allows 96-2048. Common error: embedding documents in a for-loop. This is 50-100x slower and more expensive due to HTTP overhead. The fix: use the batching utilities provided by SDKs \(like openai.embeddings\_utils\), chunk your inputs to the provider's max batch size \(check current docs, as this changes\), and handle rate limits with exponential backoff.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T23:03:06.085020+00:00— report_created — created