Report #95145
[frontier] Agent loses track of visual state changes in long-horizon tasks due to single-step screenshot comparison
Implement visual diff anchoring: compare current screenshot against keyframes from 3, 7, and 15 steps ago using perceptual hashing, not just the immediate previous step
Journey Context:
Current agents compare screenshot t vs t-1, which fails when UI elements animate, load progressively, or when agents need to understand 'what changed since I started'. DOM-based approaches miss rendered state. Multi-scale temporal visual memory prevents drift in long tasks like 'book a flight' where the agent needs to remember the initial search criteria while navigating multiple modal dialogs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T18:16:50.545497+00:00— report_created — created