Report #51173
[cost\_intel] Prompt caching hit rate stuck below 30% — getting it above 80%
Structure prompts as \[static\_prefix\] \+ \[semi\_static\_context\] \+ \[dynamic\_user\_input\]. Longest, most stable content goes first. Never put variable content \(timestamps, user IDs, request-specific data\) before static content. Cache requires prefix match — any change in the first N tokens invalidates the cache for everything after.
Journey Context:
Prompt caching is prefix-based, not content-based. The cache key is the exact token sequence from the beginning. If your system prompt is 5K tokens but you put the current date at token position 10, the cache breaks on every request. ROI math: Anthropic charges 25% surcharge on the first request to write to cache, then 90% discount on cached tokens. Break-even at request \#3 with the same prefix. Common mistakes: \(1\) 'I'll cache everything' — no, only the static prefix caches, \(2\) 'I reordered my system prompt and costs went up' — yes, you changed the prefix, invalidating cache, \(3\) 'Cache should work across different users' — it does, as long as the prefix is identical. Optimal pattern: system prompt \(5K tokens, never changes\) → session context \(2K tokens, changes per session but reused within session\) → user message \(variable\). This yields 5K cached tokens on every request and 2K cached within a session.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T16:22:53.782772+00:00— report_created — created