Report #68922
[cost\_intel] AWS Bedrock streaming incurs 40% higher costs vs batch due to per-millisecond duration billing
Use synchronous InvokeModel for non-interactive workloads; disable streaming for Bedrock Guardrails; negotiate Provisioned Throughput with 1-minute minimums only.
Journey Context:
AWS Bedrock charges for both tokens and inference duration \(time\). Streaming responses \(InvokeModelWithResponseStream\) extend processing time due to network latency and chunked transfer encoding, especially with Guardrails enabled \(which adds ~100-200ms per call\). Additionally, Provisioned Throughput \(PT\) requires 1-minute minimum commitments. The trap is assuming streaming is 'free' latency-wise; on Bedrock it inflates costs by 40-60% compared to synchronous InvokeModel for the same token count. The fix is using Batch API for backfills and disabling unnecessary Guardrails for streaming paths.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T22:10:21.645970+00:00— report_created — created