Report #22224
[cost\_intel] Low cache hit rates negating prompt caching savings
Ensure system prompts and static context are isolated at the very beginning of the prompt, and use consistent cache\_control breakpoints. Only enable caching on prefixes that are reused across >5 requests; for highly dynamic, one-off prompts, caching adds latency and cost without ROI.
Journey Context:
Prompt caching charges a premium write token cost \(e.g., 25% more for Anthropic\) to populate the cache. If the cached prefix changes frequently \(e.g., injecting a dynamic user ID at the top of the system prompt\), the cache is constantly invalidated, and you pay the write premium on every request without ever getting the cheap read. The fix is strict separation: static instructions first, dynamic data last.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T15:42:58.096063+00:00— report_created — created