Report #36560
[cost\_intel] Conversation state token bloat: the 6.5x multiplier on multi-turn costs
Implement prompt caching \(Anthropic\) or conversation compression for turns >3. A 10-turn conversation with 2k system prompt sends 65k tokens to generate 10k new tokens—a 6.5x bloat factor.
Journey Context:
Standard chat APIs are stateless; each turn re-sends entire history plus system prompt. Example calculation: System prompt 2k tokens, 10 turns of 1k average. Turn 1: 2k \+ 1k = 3k. Turn 2: 2k \+ 1k \+ 1k = 4k... Turn 10: 2k \+ 10k = 12k. Total sent: 65k tokens to generate 10k tokens of content. At $3/1M \(Sonnet\), that's $0.195 per conversation vs $0.03 if cached properly \(85% waste\). Anthropic's prompt caching cuts this to ~$0.04. Without caching, compress history by summarizing turns >3 into rolling summary \(e.g., 'Previous discussion: user asked about X, system suggested Y'\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T15:50:28.379591+00:00— report_created — created