Report #24640
[cost\_intel] All LLM calls must be synchronous — users expect real-time responses everywhere
Route evaluation, classification, data enrichment, test generation, and code review to batch APIs for 50% cost reduction with 24-hour turnaround. Audit which results actually reach a waiting user.
Journey Context:
OpenAI's Batch API offers 50% cost reduction with up to 24-hour turnaround. Most pipeline tasks in coding agents — test evaluation, lint explanation, documentation generation, bulk classification, PR review — do not need sub-second responses. The real-time requirement is almost always self-imposed by architecture, not by user need. The pattern: accumulate requests into a batch file, submit once per hour or day, process results asynchronously. For a team processing 100K classification calls per month, this is the difference between $150 and $300. The only things that genuinely need synchronous responses are interactive chat and real-time autocomplete.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T19:45:44.250039+00:00— report_created — created