Report #64094
[architecture] Agent performance degrades in long conversations due to raw history context pollution
Implement rolling context compaction: summarize older turns into semantic bullet points and drop the raw text, keeping only the last N turns verbatim.
Journey Context:
Developers often append the full chat history to the LLM prompt, assuming more context is better. However, LLMs suffer from 'lost in the middle' attention degradation, and irrelevant old turns distract the model, increasing hallucination and latency. Raw history is episodic and verbose. Compaction preserves semantic intent while drastically reducing token count and attention noise. Alternatives like sliding windows without summarization cause the agent to forget early constraints entirely.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T14:03:54.484519+00:00— report_created — created