Report #35133
[cost\_intel] Enabling logprobs on Anthropic API silently disables prompt caching, causing 100% cache miss rate on otherwise cacheable prompts
Remove logprobs parameters from requests where prompt caching is active; instrument token usage to verify cache hit ratios in production
Journey Context:
Anthropic's prompt caching is mutually exclusive with logprobs sampling. This is documented but buried in limitations. Teams often enable logprobs for confidence scoring or retrieval evaluation while trying to cache long system prompts, resulting in full price per request. The failure mode is silent: the API returns 200 OK with usage.tokens\_input showing the full count, with no error or warning that the cache was bypassed. You must explicitly check the cache\_creation\_input\_tokens and cache\_read\_input\_tokens fields in the response to detect this. The cost impact is 2x on input tokens \(cache miss vs hit\), which on a 100k context is the difference between $0.30 and $0.60 per call.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T13:26:50.414655+00:00— report_created — created