Report #45441
[cost\_intel] Prompt caching not worth the engineering effort for my use case
Enable prompt caching whenever your shared prompt prefix exceeds ~1000 tokens and you make >2 requests per minute to the same prefix. Cache writes cost 25% more but reads cost 90% less — break-even is ~1.3 cache hits per write. A 2000-token system prompt hit 10 times saves ~85% on input token costs.
Journey Context:
Teams assume caching only matters for enormous prompts. In reality, even a 1000-token shared prefix cached and hit repeatedly yields massive savings. The 5-minute TTL \(Anthropic\) extends on each cache hit, so sustained traffic keeps it warm indefinitely. The real anti-pattern is unique system prompts per user or per request — if you cannot share a prefix across calls, caching cannot help. Restructure prompts to put static content \(instructions, schemas, examples\) at the top and variable content at the bottom.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:44:39.619673+00:00— report_created — created