Report #81540
[frontier] Screenshot-based agent loses UI element tracking after scrolling or viewport changes
Implement persistent Set-of-Marks \(SOM\) IDs that maintain object permanence across frames, updating coordinates based on viewport deltas rather than re-detecting per frame
Journey Context:
Pure computer-vision agents treat each screenshot as independent, causing them to 'forget' where buttons are after scrolling. DOM-based agents don't have this problem because they use stable selectors. The emerging hybrid pattern uses visual grounding with persistent IDs \(e.g., labels 1, 2, 3\) that follow elements across viewport changes, only regenerating the map when a significant layout shift is detected via DOM mutation events. This prevents the 'lost cursor' problem where agents click old coordinates after navigation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T19:28:00.309157+00:00— report_created — created