Report #61863
[cost\_intel] Anthropic prompt caching at 200k context window hits cache limit causing full price billing on context over 180k tokens
Keep cacheable prompt prefixes under 120k tokens to ensure they fit within Anthropic's 4-epoch cache window; shard long documents into <100k chunks with overlapping cache breaks.
Journey Context:
Anthropic's prompt caching offers 90% discount on cache hits but has strict limits: the cache is only valid for 4 sequential epochs \(requests\) and has a maximum cacheable context length that is less than the advertised 200k window \(approximately 180k practical limit\). When using long context \(e.g., 190k tokens\), the cache breaks silently after a few turns, and the full 190k tokens are billed at standard input rates \(~$3/1M tokens for Sonnet 3.5\). The trap is assuming that 'caching is on' means the whole 200k context is cached. In reality, only the first ~120k tokens reliably cache across the 4-turn limit; beyond that, you pay full freight for the tail context on every request.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T10:19:26.058887+00:00— report_created — created