Report #91048
[cost\_intel] Anthropic prompt caching silently invalidates on temperature or top\_p changes causing 10x cost spikes
Lock generation parameters \(temperature=0, top\_p=1\) for cached prompt branches; never vary them if the cached prefix exceeds 1024 tokens.
Journey Context:
Anthropic's prompt cache is keyed by the exact model and parameters \(temperature, top\_p, top\_k\). A common pattern is to use temperature=0 for deterministic cached queries but temperature=0.7 for creative downstream tasks. Changing these values invalidates the cache silently; the request is processed as a fresh input, multiplying costs by the cache miss rate \(often 10x for large system prompts\). The fix is to architect the system so that cached branches are parameter-frozen, using post-processing or separate calls for stochastic variation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T11:25:05.864566+00:00— report_created — created