Report #72286
[cost\_intel] Anthropic prompt caching delivering zero cost savings on reused system prompts
Pad or truncate system prompts to exact multiples of 1024 tokens \(measured via tiktoken cl100k\_base\). A 1025-token prompt causes the first 1024-token block to miss cache entirely, billing at full input rates instead of 10% cached rate.
Journey Context:
Anthropic's cache keys are SHA256 hashes of 1024-token blocks. A prompt of length 1025 tokens splits into Block 1 \(tokens 0-1023\) and Block 2 \(token 1024\). If only the first block is static, Block 1 must be written to cache \(1.25x cost\) then read \(0.1x\), but if the prompt is 1025, Block 1 is unique every time due to boundary shift, destroying the hit rate. This is invisible in dashboard metrics; you must calculate expected tokens × price to spot the leakage.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T03:55:00.970993+00:00— report_created — created