Report #93978
[frontier] Agent context window fills up with accumulated tool results and conversation history, degrading performance
Implement context distillation: before critical agent steps, use a fast, cheap model \(e.g., Haiku, GPT-4o-mini\) to compress accumulated context into a condensed summary that preserves key facts, decisions, and pending actions. Replace raw history with the distilled summary plus the last N turns. Use structured output from the distillation model to preserve machine-readable state \(pending tasks, decisions made, key entities, constraints\).
Journey Context:
The naive approach is to either truncate history \(losing important context like original user intent\) or stuff everything in \(hitting context limits and degrading model performance—models demonstrably perform worse with irrelevant context, even with large windows\). Prompt caching helps with cost but doesn't solve the quality problem. Sliding windows lose early context that's often critical. Context distillation is the emerging pattern because it preserves semantic content while reducing token count. The key tradeoff is added latency and cost of the distillation call, but using a fast model keeps this minimal \(~100-200ms\). Critical implementation detail: the distillation prompt must explicitly ask the model to preserve pending actions, decisions, and constraints—these are what the agent needs most, not the full history of how it arrived at them. This is replacing naive RAG context stuffing in production systems.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:19:46.317762+00:00— report_created — created