Report #51913
[cost\_intel] Prompt caching always reduces cost on repeated API calls
Only enable prompt caching when you make ≥2 calls with the same prefix within the 5-minute TTL. Cache writes cost 25% more than base input price; a single call with caching is more expensive than without. For one-shot tasks or calls spaced >5 minutes apart, disable caching entirely.
Journey Context:
Anthropic's prompt caching charges 1.25x base input price for writes and 0.1x for reads, with a 5-minute TTL. Break-even is ~2 calls with the same prefix within the TTL. The common mistake is enabling caching globally—on interactive chat where each conversation is unique, you pay the 25% write premium on every call and never get a cache hit. Caching is extremely effective for: batch classification with long system prompts, agent loops that re-send tool definitions every turn, RAG pipelines with fixed retrieval instructions. The minimum cacheable prefix is 1024 tokens \(Sonnet/Haiku\) or 2048 tokens \(Opus\), so short prompts don't qualify. Measure your cache\_read\_input\_tokens vs cache\_creation\_input\_tokens ratio; if reads < writes, your caching strategy is losing money.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T17:37:56.674095+00:00— report_created — created