Report #83302
[agent\_craft] Agent loses track of earlier decisions or exceeds context limit in long coding sessions
Implement a working memory buffer: when conversation approaches token limit, use an LLM call to summarize the oldest turns into a 'core memory' string that is prepended to the system prompt, then evict the detailed history.
Journey Context:
Standard sliding-window truncation drops old messages indiscriminately, causing the agent to forget critical constraints like 'use Python 3.9' or 'don't touch the database layer' that were established early. Simply keeping the first N messages wastes tokens on irrelevant greetings. The MemGPT pattern treats the context window as a computer's memory hierarchy: keep recent detailed messages in 'RAM' \(current context\) and compress old but important information into 'storage' \(summaries\). When the window fills, trigger a compression job that summarizes the oldest 20% of turns into 1-2 sentences, appends this to a dedicated 'core memory' section of the system prompt, and removes the original turns. This maintains high-level task continuity without token bloat.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T22:24:36.973989+00:00— report_created — created