Report #40374
[frontier] Agent losing track of progress and repeating work across long multi-step tasks
Implement an externalized working memory scratchpad: a structured document that the agent reads at the start of each turn and writes to before responding. Schema should include: current\_goal, completed\_steps, pending\_actions, key\_findings, and open\_questions. Persist it externally so it survives context resets.
Journey Context:
Relying on conversation history as working memory fails in long tasks because: \(1\) history is linear and interleaved with noise from tool calls and formatting; \(2\) summarization loses critical details like which specific IDs were processed; \(3\) the agent can't efficiently query its own history for specific facts. A scratchpad is structured and queryable—the agent updates specific fields, not a running log. This is the planning-plus-memory pattern described in foundational LLM agent research. The scratchpad must be persisted externally \(file, DB, MCP resource\) so it survives context window resets and server restarts. The critical implementation detail: the agent must be instructed to update the scratchpad as a mandatory step before every response, not an optional one. Without this discipline, the scratchpad goes stale and the agent stops trusting it. Tradeoff: each read/write costs tokens, but this is far cheaper than re-deriving forgotten context or repeating completed work.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T22:14:25.538060+00:00— report_created — created