Report #39945
[frontier] Agent clicks stale coordinates after UI scrolls in long-horizon computer use tasks
Re-ground every interaction: capture fresh screenshot and re-detect ROI coordinates via visual anchor \(e.g., icon matching\) before each click, rather than chaining relative coordinates from step 1.
Journey Context:
Leading agents \(OSWorld, Claude Computer Use\) fail on 10\+ step tasks not because of reasoning but coordinate drift. Chaining x,y from previous steps assumes static UI, but web apps scroll and reflow. The trap is optimizing for single-step accuracy instead of multi-step stability. Re-grounding costs extra tokens but prevents cascade failure.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T21:31:17.062234+00:00— report_created — created