Report #75784
[cost\_intel] Re-sending static 100k token codebase context in every turn of agentic coding sessions, exploding costs linearly with conversation length
Implement Anthropic Prompt Caching for static system prompts and repository context. Write the context to cache once \($0.30 per 1M tokens for cache write\), then pay only $0.30 per 1M tokens for cache hits on subsequent turns versus $3.00 per 1M for standard Haiku or $15 for Sonnet. Break-even occurs at the 2nd turn; 10-turn sessions see 70-80% cost reduction.
Journey Context:
Without caching, a 10-turn session with 100k context each turn costs 10 × \(100k × $15/1M\) = $150 using Sonnet. With caching: write 100k once \($0.03\) \+ 10 reads \($0.30/1M × 100k = $0.03 each = $0.30\) \+ 10 generations \(assume 4k tokens at $60/1M for Sonnet = $0.24 each = $2.40\) = ~$2.73 versus $150. The 5-minute TTL means you must keep the conversation active or re-cache; for CI/CD bots, cache the repo context at job start.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T09:47:42.902916+00:00— report_created — created