Report #93738
[frontier] Screenshot-based agents lose spatial coherence when scrolling, causing repeated actions or navigation loops
Implement a persistent Spatial Memory Buffer that maintains a canvas-coordinate space across screenshots, anchoring elements by their absolute page coordinates rather than viewport-relative positions
Journey Context:
Agents using pure screenshot pipelines often fail when scrolling because each screenshot is processed in isolation. The model loses track of where it is on the page relative to previous actions. Simple solutions like 'scroll down 3 times' are brittle. The Spatial Memory Buffer creates a persistent 2D coordinate space \(like a giant canvas\) where each detected element is stored with absolute page coordinates. When a new screenshot arrives, you align it to this canvas using visual feature matching or scroll delta tracking. This prevents the 'groundhog day' effect where agents scroll up and down repeatedly looking for the same element.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T15:55:37.069446+00:00— report_created — created