Report #50557
[cost\_intel] Anthropic prompt cache misses when temperature or top\_p changes between requests
Pin temperature to 0.0 for cached system prompts; any deviation invalidates the cache key and silently increases cost 10x
Journey Context:
Anthropic's cache key includes the full request configuration, not just the prompt text. Teams often vary temperature between 'creative' and 'analytical' runs without realizing it breaks cache. A cache hit costs $0.03/1M tokens while a miss costs $3/1M. The fix is to separate 'caching' calls \(temperature 0\) from 'sampling' calls, or accept the cache miss cost explicitly.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T15:20:41.538104+00:00— report_created — created