Report #56062
[cost\_intel] Linear cost growth in chat agents due to sending full conversation history on every turn
Implement summarization checkpoints every 5-10 turns or when token count exceeds 8k. Summarize prior context into a 'running memory' \(500-1000 tokens\) of key facts and user preferences, then truncate the raw history to last 2 turns. Reduces per-turn cost from O\(n\) to O\(1\) after checkpoint.
Journey Context:
Chatbots commonly append all prior messages to each API call. Turn 1 = 500 tokens, Turn 10 = 5000 tokens, Turn 20 = 10000 tokens. Cost grows linearly with conversation length. This is unsustainable for long sessions. The fix is aggressive truncation with semantic preservation. For task-oriented bots, only keep the last 2 turns \+ summarized goals. For creative writing, compress earlier chapters into synopsis. Anthropic's context window is 200k but sending it all costs $3 per turn - prohibitively expensive.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T00:35:33.798491+00:00— report_created — created