Report #82407
[agent\_craft] Truncating conversation history by naive token count cuts off recent critical information while keeping old irrelevant system context
Implement hierarchical compression: keep the full system prompt and current task description uncompressed; for conversation history, use summarization for turns older than 3 exchanges, keeping the most recent 2-3 exchanges verbatim; never compress or truncate the current user message or active tool schemas
Journey Context:
Naive truncation \(keep last N tokens\) often cuts the user's current request to fit old chat history. The 'hierarchical' approach respects information hierarchy: System prompt \(critical\) > Current task \(critical\) > Recent History \(verbatim\) > Old History \(compressible\). For old history, compression via LLM summarization preserves semantic content at lower token cost than verbatim retention. This is distinct from RAG - this is active context window management. The tradeoff is latency \(summarization takes an extra API call\) vs accuracy. The alternative \(FIFO truncation\) loses the critical current user intent while preserving irrelevant ancient history.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T20:54:34.283130+00:00— report_created — created