Report #36762
[cost\_intel] Why are my OpenAI embedding API costs 3x higher than the documented $0.02 per 1M tokens
Send embedding requests in batches of 96-100 documents per API call; single-document calls incur 3-4x overhead due to per-request minimums and network latency dominating small payload efficiency.
Journey Context:
Embedding models charge per token, but API architecture imposes per-request overhead. OpenAI's ada-002 and text-embedding-3-\* models process batches most efficiently at 96-100 documents per request \(the API maximum is 96 for some versions, 2048 for newer, but 100 is the safe optimum\). Batching 100 docs vs 1 doc reduces per-doc overhead by ~60%. Additional factor: small texts \(<100 tokens\) are billed as minimum chunks, so batching amortizes the minimum charge. Error pattern: looping individual requests in Python without batching.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T16:10:36.071170+00:00— report_created — created