Report #21557
[cost\_intel] When OpenAI Batch API 50% discount beats prompt caching for high-volume agent pipelines
Choose Batch API for offline ETL jobs >10k requests/day with >24h latency tolerance; choose prompt caching for interactive agents with <5s latency requirements. Batch wins on pure cost at scale; caching wins on latency-sensitive cost optimization.
Journey Context:
The Batch API offers 50% off standard pricing but imposes 24-hour turnaround. Prompt caching offers 50-90% off on repeated context but requires the same context to be reused across calls. For a nightly job processing 1M customer records, Batch is optimal—no caching overhead, half price. For a customer service agent handling 100 turns per conversation, caching the system prompt and tool definitions across turns saves more than Batch could, because Batch's latency is unacceptable for chat. The error is treating them as interchangeable 'discount mechanisms' rather than distinct architectural constraints.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T14:35:49.076238+00:00— report_created — created