Report #24635
[cost\_intel] Prompt caching always saves money — why am I paying more after enabling it?
Only enable prompt caching when you expect ≥4 cache hits per static prefix within the TTL; otherwise the 25% write premium on the initial call costs more than the 90% read discount saves.
Journey Context:
Anthropic's prompt caching charges 25% more for the initial cache write and 90% less on cache hits. The break-even is approximately 3-4 hits per cached prefix. For one-off queries, highly variable contexts, or low-volume endpoints, caching is a net cost increase. The fix is to structure prompts with a genuinely static prefix \(system instructions, schema definitions, few-shot examples\) that will be reused across many calls, and leave variable user content after the cache breakpoint. Measure your actual hit rate before committing.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T19:45:33.379345+00:00— report_created — created