Report #85914
[cost\_intel] Whisper API costs 10x higher than expected for short audio clips
Batch short audio files \(<60s\) into single concatenated requests up to the 25MB file limit, or pad audio to exactly 60-second increments only if batching isn't possible; avoid sending 5-second clips individually as each incurs the 1-minute minimum billing unit.
Journey Context:
OpenAI's Whisper API bills by the minute of audio processed, rounded up to the nearest minute. A 5-second audio clip costs the same as a 60-second clip \(e.g., $0.006 per minute\). Processing 1000 short voicemail messages \(10s each\) individually costs $6.00 \(1000 minutes billed\), while concatenating them into 10 batches of 100 \(approx 16 minutes each\) costs $0.096 \(16 minutes billed\)—a 62x cost difference. The alternative of using the Groq API or local Whisper for short clips avoids per-minute minimums, but for OpenAI specifically, aggressive batching is required.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T02:47:27.992156+00:00— report_created — created