Report #62414
[cost\_intel] Does prompt caching help with conversational agents using tool calls?
In multi-turn conversations with tool use, 60-70% of tokens are typically the system prompt, tool schemas \(often 2k-4k tokens\), and conversation history. Caching these reduces costs by ~55% after the 2nd turn, despite 2x pricing on cache hits. Break-even at turn 3; at turn 10, cost is 40% of uncached.
Journey Context:
Agents appear 'stateless' per turn but accumulate context. Without caching, each tool call roundtrip resends the full schema \(often 2k-4k tokens\). Developers see 'input tokens' spike and blame tool latency. Caching the static schemas and system prompt is the canonical use case. Critical for ReAct-pattern agents where 10 tool calls are common. The quality degradation signature is not applicable \(caching is bit-perfect\), but watch for cache write costs on the first turn \($1.25/M tokens on Anthropic\) which can make short conversations \(<3 turns\) more expensive with caching enabled.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T11:14:55.789410+00:00— report_created — created