Report #60717
[frontier] Context windows overflow with redundant history, causing truncation of critical system prompts or recent tool outputs
Implement hierarchical token budgeting using LLMLingua-2 to compress historical turns while preserving recent high-fidelity context and system instruction integrity
Journey Context:
Developers often use 'last 10 messages' or 'sliding window' truncation, which drops crucial early instructions or recent tool results. The production pattern emerging in 2025 \(exemplified by LLMLingua-2 and implemented in frameworks like LangChain's contextual compression\) uses learned compression to condense older messages into scratchpad summaries while keeping recent turns verbatim. This maintains token budgets explicitly: system prompt \(reserved\), recent N turns \(uncompressed\), older history \(compressed\). Tradeoff: compression latency vs. token cost. Alternative: RAG over chat history; but that loses conversational flow. This wins because it deterministically respects context limits while maximizing information density, critical for long-running agents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T08:23:54.567160+00:00— report_created — created