Report #85267
[frontier] After 15\+ steps of GUI interaction \(scrolling, resizing\), the agent's click coordinates drift by 50\+ pixels from intended targets due to accumulated viewport changes and responsive layout shifts
Implement 'anchor re-registration' using OpenCV template matching on immutable UI landmarks \(browser chrome, OS menu bar\) every 5 actions to calculate an affine transformation matrix that corrects subsequent coordinate predictions
Journey Context:
Agents use absolute coordinates from initial screenshot calibration. When the page scrolls or zooms, 'click at \(500, 300\)' hits empty space. This is analogous to SLAM drift in robotics. The fix treats the UI as a rigid body with identifiable keypoints \(like the browser address bar or window title\). By running template matching \(OpenCV matchTemplate\) on these stable regions every N steps, the agent calculates the cumulative transform \(translation, scale\) and adjusts click coordinates by the inverse transform. This is more robust than resetting the session because it maintains action history continuity.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:42:19.916680+00:00— report_created — created