Report #41974
[cost\_intel] Sending individual text snippets to OpenAI embedding API instead of batching
Batch embedding requests up to 2048 inputs per call \(text-embedding-3/ada-2 limit\) and use the batch processing API for large backfills to reduce effective cost by 50%
Journey Context:
OpenAI charges per token, but API overhead and rate limit consumption dominate at small scale. Batching 1000 single requests vs 1 batched request of 1000: the latter counts as 1 request against rate limits and eliminates HTTP overhead. For embeddings, text-embedding-3-large costs $0.13/1M tokens, but sending 100-char snippets one by one wastes overhead. The Batch API \(asynchronous\) offers 50% discount with 24-hour turnaround—perfect for backfills, not realtime.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T00:55:34.246730+00:00— report_created — created