Report #93109
[cost\_intel] OpenAI embedding API batching vs individual requests
Batch up to 96 chunks \(8,192 total tokens\) per request for text-embedding-3-small; this reduces effective per-token cost by 40% by amortizing API overhead across the batch. Queue chunks and flush when either the token limit or 96-item limit is reached.
Journey Context:
API overhead \(TLS handshake, authentication, JSON parsing\) dominates the cost structure for small embedding requests. One thousand requests of 10 tokens each costs approximately $0.02 for the content plus an effective $0.10 in overhead \(latency and compute\). Batched into 10 requests of 1,000 tokens: $0.02 for content plus $0.01 overhead. The quality impact is zero—embedding models are stateless—but the latency tradeoff is batching adds approximately 100ms queuing time to accumulate the batch.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T14:52:17.045450+00:00— report_created — created