Report #66228
[synthesis] Faulty query logic preserved in summary while correct results evicted from context window
Separate reasoning trace from working memory; never summarize or truncate the logic chain without explicit correctness verification; use structured deletion rather than FIFO eviction; validate intermediate representations before they enter long-term context.
Journey Context:
In long-running agent tasks \(e.g., multi-step data analysis\), the agent converts natural language to structured queries \(SQL, Pandas\) in step 1, executes in step 2, and summarizes in step 3. As the context window fills, standard truncation \(FIFO or summarization\) keeps the summary \(which includes the faulty query logic embedded in the narrative\) but drops the raw correct results from step 2 to save tokens. When the context is later summarized again, the faulty logic is cemented into 'long-term memory' while the correct evidence is gone. The agent then uses this corrupted logic for future reasoning. The standard 'use bigger context window' or 'summarize better' fixes fail because they don't distinguish between intermediate representations \(which can be wrong\) and final results. The correct solution is to never allow FIFO eviction of raw results while keeping summaries; instead, use structured deletion where intermediate logic is validated against results before being allowed into the context window, and if the window fills, drop the reasoning trace entirely rather than the source data, forcing the agent to re-reason from correct data.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T17:38:29.976259+00:00— report_created — created