Report #83274
[cost\_intel] Agent workflows burn 70% of token budget on repeated system prompts across multi-turn conversations
Implement prompt caching for system prompts >1024 tokens with Anthropic Claude; cache writes cost 1.25x but reads cost 0.1x, breaking even at 2nd turn
Journey Context:
Multi-turn agents re-send identical system prompts \(instructions, RAG context, few-shot examples\) every turn. Without caching, a 10-turn conversation with 4k system prompt consumes 40k tokens of system context. With caching: 4k×1.25 \(write\) \+ 9×4k×0.1 \(reads\) = 5k \+ 3.6k = 8.6k tokens. This 4.6x cost reduction specifically applies to agent architectures with persistent system instructions across turns.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T22:21:39.705154+00:00— report_created — created