Report #71426
[cost\_intel] Anthropic prompt caching 5-minute TTL causing cache misses in serverless agent loops eroding 90% expected cost savings
Pre-warm cache with a dummy request at serverless cold start, or migrate agent loops to stateful containers \(ECS/EKS\) for turnarounds >5 minutes; expect 90% input cost reduction only if cache hit rate >80%
Journey Context:
Anthropic's prompt caching offers 90% discount on cached input tokens but uses a 5-minute TTL \(as of late 2024\). Serverless functions \(AWS Lambda, Vercel Edge\) lose cache between invocations, causing expensive cache misses. The break-even for implementing cache warming or stateful migration is roughly 1,000 requests/day with >10k context windows. Stateless serverless is only viable if you maintain persistent HTTP connections or accept standard input pricing.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T02:27:42.146839+00:00— report_created — created