Report #24612
[cost\_intel] Unexpected cost explosion in long-running chat sessions with Claude 3.5 Sonnet
Implement sliding window truncation at 8k tokens with periodic summarization; maintaining full 100k\+ context windows costs $0.30\+ per request in input tokens alone at $3/1M pricing, making unbounded history 10x more expensive than windowing with compressed checkpoints
Journey Context:
Developers often send full conversation history to maintain context, not realizing that with 200k context windows at $3/1M tokens \(Claude 3.5 Sonnet\), maintaining a 100k token history costs $0.30 per API call in input tokens alone, even before the new user message. For a chatbot with 1000 daily active users averaging 10 turns, this becomes $300/day just in context retention. The pattern is to use a sliding window \(last 4k tokens of raw history\) plus a compressed summary of older turns \(generated every 10 turns\), keeping total context under 8k tokens unless the specific task requires full historical precision \(e.g., legal document review\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T19:43:27.319310+00:00— report_created — created