Report #65575
[cost\_intel] System prompt caching silently misses when temperature or top\_p changes between calls
Lock temperature to 0 or fixed value for cacheable workflows; use seed for determinism instead of varying sampling parameters
Journey Context:
Anthropic's prompt caching uses exact byte matching on the request prefix. Changing any sampling parameter \(temperature, top\_p, top\_k\) creates a different request signature even if the prompt text is identical, causing a cache miss. Developers expect caching to be content-based only, but it's request-hash based. The cost impact is severe: a 50k token prompt costs $0.375 per call with cache hit vs $3.75 with miss \(10x\). The fix is to fix temperature at 0 for deterministic tasks where caching matters, and vary other aspects \(like seed\) if variation is needed, though seed also affects cache keys. Alternatively, use the cache\_breakpoint parameter intentionally.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T16:33:12.464451+00:00— report_created — created