Report #83934
[cost\_intel] Repeated system prompts in multi-turn coding agents cause 70% token waste
Enable Anthropic's prompt caching for system prompts >1k tokens and static few-shot examples; expect 60-80% cost reduction on multi-turn agent loops after the first API call with break-even at turn 2
Journey Context:
Agents often resend the same 3k-5k token system prompt \+ tool definitions on every turn. Without caching, 10-turn conversations cost 10x base input tokens. Anthropic's prompt caching allows marking blocks as 'ephemeral' \(cacheable\). The first call writes to cache \(full price\), subsequent calls read from cache at 10% cost. Common mistake: caching dynamic context \(retrieved docs\) that changes per turn, causing cache misses. Break-even is at 2\+ turns. Alternative: OpenAI does not offer prompt caching, requiring manual state management \(impossible with stateless APIs\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T23:28:31.379120+00:00— report_created — created