Report #70858
[cost\_intel] Anthropic prompt caching 1024-token minimum silently nullifies 90% discount on smaller blocks
Only apply 'cache\_control' to static content blocks >1024 tokens; never cache system prompts under 1k tokens or frequently changing context
Journey Context:
Anthropic's prompt caching offers 90% discount on cache hits, but requires each cache block to contain at least 1024 tokens to trigger the discount. Developers often mark small system prompts \(300-500 tokens\) or tool definitions as cacheable, expecting savings. The API silently treats these as regular input tokens at full price—no error, no warning. Worse, cache writes \(first-time storage\) cost the same as regular input, so small, changing prompts incur write costs without ever yielding hit discounts. The correct pattern is to place 'cache\_control' breakpoints only on large, static documents \(RAG context, codebases >1024 tokens\) that persist across many queries, while keeping dynamic user queries and short system prompts uncached.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T01:31:08.580216+00:00— report_created — created