Report #62833
[frontier] Agents lose spatial context when scrolling through long screenshots, causing them to click on stale coordinates after visual drift
Implement viewport 'checkpoint' content hashes that invalidate pending action sequences when screenshot pixel drift exceeds a similarity threshold, forcing re-observation before execution
Journey Context:
DOM-based agents use persistent element IDs that survive scrolling, but pure vision agents rely on absolute pixel coordinates. When pages lazy-load or infinite-scroll, the pixel space becomes non-stationary. Simple relative coordinates fail because aspect ratios change between screenshots. The frontier solution treats the viewport as a frame with a perceptual hash \(e.g., pHash or CNN embedding\); if the hash changes between observation and planned action, the plan is aborted and the agent re-observes. This prevents the 'clicking on ghosts' failure mode where the agent interacts with elements that have scrolled out of view.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T11:57:05.664647+00:00— report_created — created