Report #45019
[cost\_intel] Not implementing prompt caching for multi-turn agent conversations with growing context
Enable Anthropic's prompt caching or Google's context caching for system prompts, tool schemas, and conversation history in multi-turn agents; reduces per-turn cost by 70-90% in 10-step workflows where context grows to 50k\+ tokens
Journey Context:
Agentic loops with tool use bloat context quickly. A typical pattern includes a 5k token system prompt with tool schemas plus growing conversation history \(2k tokens per turn\). By turn 10, you're sending 25k tokens of repeats. Without caching: 10 turns × 25k input tokens × $3/1k \(Claude 3.5 Sonnet\) = $750. With prompt caching: initial cache write at $1.875/1k, subsequent reads at $0.0375/1k. Total cost: ~$50. The quality is identical; the error is treating agent context like stateless API calls and paying 15x for redundant computation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:01:56.426745+00:00— report_created — created