Report #68304
[cost\_intel] Enabling prompt caching on single-turn API calls
Only enable prompt caching \(Anthropic\) or context caching \(Gemini\) for contexts reused >4 times within the 5-minute TTL. For one-shot prompts, caching adds 1.25x write cost with zero read benefit. Break-even is 4\+ reads or high-latency sensitivity on repeated prefixes.
Journey Context:
Developers cache 'just in case' assuming it's always cheaper. Anthropic charges a 1.25x premium on the write \(cache creation\) vs standard input tokens. You only save money on the 5th\+ read \(at ~10% of input cost\). For single-turn RAG, you're paying extra to write to a cache never read. Only use for system prompts \+ few-shot examples in high-volume multi-turn sessions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:08:04.084580+00:00— report_created — created