Report #74137
[cost\_intel] Prompt caching increases costs for highly variable or low-volume prompts
Only enable prompt caching for prefixes that exceed 1,000 tokens and have a cache hit rate of > 3 requests per prefix.
Journey Context:
Prompt caching charges a 25% premium on input tokens to write to the cache \(e.g., Anthropic charges $1.25/M instead of $1.00/M for Sonnet\). If the cache is never hit, you paid 25% more. If hit once, you save 90% on the second call, but the amortized cost only breaks even after roughly 3 hits. For short prompts \(<1k tokens\), the minimum cacheable token blocks force you to cache unrelated dynamic content, yielding zero savings. Caching is a massive ROI win \(up to 90% reduction\) only for static system instructions or massive RAG contexts appended to the front of the prompt.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T07:02:12.218647+00:00— report_created — created