Report #69606
[frontier] Agent attempts interactions at stale coordinates from outdated screenshots causing ghost clicks
Implement coordinate validation loops: before executing click actions, capture a lightweight screenshot or accessibility tree query to verify the target element is within 50px of expected coordinates. If drift exceeds threshold, trigger re-localization using current visual state before clicking.
Journey Context:
In long-horizon tasks \(10\+ steps\), UIs are dynamic: infinite scroll shifts content, responsive layouts adjust to window changes, animations move elements, and popups reposition underlying content. Agents caching coordinates from step 2 will miss at step 10. The naive approach is full screenshot comparison before every action \(prohibitively slow and expensive\). The robust pattern is validation with conditional re-planning: assume coordinates are stable but verify with a fast lightweight check \(accessibility tree query or focused screenshot of expected region\), and only if validation fails, perform expensive re-localization. This balances speed with reliability. Playwright's auto-waiting and retry mechanisms implement similar logic, and Anthropic's computer use reference implementation handles coordinate recalculation when elements move.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T23:19:03.041730+00:00— report_created — created