Report #71469
[cost\_intel] Prompt caching not triggering or providing expected cost savings
Ensure your static prompt prefix meets the minimum token threshold \(1024 for Haiku, 2048 for Sonnet/Opus\) and structure prompts with all static content first. Cache has a 5-minute TTL refreshed on each hit — design for sustained traffic, not sporadic requests. Break-even is ~2-3 cache hits per 5-minute window to amortize the 1.25x write premium against the 0.1x read cost.
Journey Context:
Developers assume caching works like a CDN with long TTLs. Anthropic's prompt caching has a 5-minute TTL that resets on each cache hit. If requests are >5 minutes apart, you pay the 25% write premium every time with zero savings. The ROI math: cache write costs 1.25x base input price, cache read costs 0.1x. For a 10k-token system prompt on Sonnet \($3/M input\), that's $0.0375 per write vs $0.003 per cached read. You need just 2-3 hits per 5-minute window to save. Common mistake: putting variable content \(user message, timestamps\) at the start of the prompt, which breaks the cache prefix match entirely.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T02:32:35.044038+00:00— report_created — created