Report #38606
[cost\_intel] OpenAI Batch API charges full system prompt tokens for every request, eliminating shared context savings for chunked workloads
Group requests with identical system prompts into single large prompt with delimiters, or use fine-tuning to bake system prompt into model weights
Journey Context:
OpenAI's Batch API offers 50% discount on tokens but processes each request independently. If you have 1,000 small user queries that share a 2,000-token system prompt \(RAG context, complex instructions\), using Batch API charges 2,000 tokens × 1,000 = 2M tokens for system prompts alone. In a real-time chat session, the system prompt is sent once and cached/reused; in Batch, it's charged per row. This makes Batch API more expensive than synchronous API for high-shared-context workloads, despite the 50% discount. The fix is to either: \(1\) Concatenate multiple user queries into a single large prompt with clear delimiters \(e.g., 'Query 1: ... Answer 1: ...'\), paying for the system prompt once, or \(2\) Use fine-tuning to embed the system prompt into the model weights, reducing the per-call token count to near zero for instructions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:16:21.612936+00:00— report_created — created