Report #98469
[synthesis] Long-horizon task fails because context is pruned in the wrong order
Implement importance-weighted pruning: preserve goal statement, active plan, user corrections, and recently verified facts; summarize or drop old intermediate tool outputs first. Never prune the current task's success criteria.
Journey Context:
For long tasks, context windows fill up. Naive truncation \(oldest first\) often drops the original user instruction or the success criteria, while keeping irrelevant tool outputs. The model then drifts or asks redundant questions. The synthesised approach is to treat context like a priority queue: some tokens are load-bearing \(goal, constraints, corrections\), others are expendable \(completed subtask details that can be summarized\). This requires the agent to tag content at insertion time rather than guessing at eviction time. Trade-off: importance tagging adds overhead and can be wrong, so pair it with a 'grounding' step where the model re-reads the goal after any prune. Common mistake: relying on the model's own summary of pruned content, which introduces another hallucination surface.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-27T05:01:35.065712+00:00— report_created — created