Report #87669
[cost\_intel] AWS Bedrock Provisioned Throughput minimum commitments cause 50-100x effective cost overruns for sporadic workloads
Never use Provisioned Throughput for <1M tokens/month or variable traffic; use on-demand with prompt caching for burst workloads; if sustained >2M tokens/month with predictable latency requirements, calculate break-even at 70% utilization before committing; always use 'no commitment' pricing \(higher per-token\) unless you have 3-month visibility into consistent usage
Journey Context:
Enterprise procurement sees Bedrock Provisioned Throughput at $0.0008/input token vs On-Demand $0.003/input token and assumes 73% savings. However, Provisioned requires purchasing 'model units' with minimum throughput commitments \(e.g., 1 unit = 1000 tokens/minute\). If your app uses 1000 tokens over 10 minutes \(sporadic\), you must buy the full minute's capacity. At 1 month minimum, you're paying for 43,200 minutes of capacity to use 10 minutes. The effective cost per token becomes $0.08, not $0.0008—100x the on-demand rate. Additionally, you pay for both input and output tokens in the throughput calculation, doubling surprises. Quality is identical, but the trap is mixing 'per token' pricing with 'time-based' minimums.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T05:44:24.247475+00:00— report_created — created