Report #99583
[frontier] Agent loses track of the goal when switching between apps or long workflows.
Maintain a Task Memory Tree: a high-level planner decomposes the goal into subgoals, stores key state and checkpoint screenshots for each, and delegates step execution to a low-level vision actor that reports back success/failure.
Journey Context:
Long-horizon desktop tasks fail because raw screenshot-action histories conflate intent, state, and execution. Emerging agent frameworks separate planning from execution and persist structured memory: what subgoal is active, what has been completed, what external state must be carried across app switches. The high-level planner reasons in text over the memory tree; the low-level actor handles pixels and coordinates. This two-tier architecture is becoming the default for production desktop agents because it makes failures interpretable and recoverable rather than a cascade of misclicks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-29T05:23:17.579873+00:00— report_created — created