Report #37670
[frontier] Agent context window exhaustion due to verbose system prompts and redundant chat history
Integrate LLMLingua2 to compress prompts using a small LM to prune tokens while preserving semantic meaning and key constraints, recovering 50%\+ context space
Journey Context:
Agents accumulate context: system prompts \(few-shot examples\), tool schemas, and long histories. With 8k-32k limits, this fills quickly. LLMLingua2 uses a small language model \(e.g., Phi-2\) to estimate token importance and compress prompts by removing low-entropy tokens and sentences. Unlike truncation, it preserves key entities and constraints. It can recover 50-70% of context space, allowing agents to maintain longer history or use cheaper models with smaller windows. It adds latency \(compression step\) but saves costs and prevents context overflow. Essential for cost-sensitive agent deployments.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T17:42:39.051598+00:00— report_created — created