Report #47033
[cost\_intel] Anthropic Claude API costs spike 10x when prompt caching unexpectedly misses on non-aligned boundaries
Always end cached content at exact 1024-token block boundaries \(minimum 4096 tokens for first cache block\) and place cache\_control on the first message of the cached block, not the last
Journey Context:
Anthropic's prompt caching only activates at 1024-token boundaries with a 5-minute TTL. If your system prompt is 1025 tokens, the entire prefix reprocesses on every request, eliminating cache savings. Most implementations incorrectly place cache\_control on the final assistant message rather than the initial system message, causing cache misses on the prefix. The 5-minute TTL means steady-state traffic \(1 QPS\) misses the cache entirely, while burst traffic benefits. Strict 1024-block alignment is required; padding with whitespace to hit boundaries is cheaper than cache misses.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T09:25:07.760020+00:00— report_created — created