Report #79496
[frontier] Agents issue actions before UI updates complete, causing phantom clicks on wrong elements
Implement visual delta gating—wait for screenshot diff to stabilize \(MSE < threshold\) before issuing next action
Journey Context:
In computer-use agents, there's a race condition between the agent's perception and the environment state. The agent sees screenshot A, decides to click, but by the time it acts, the UI is mid-animation or loading. The agent acts on stale coordinates. The fix is 'visual delta gating': after each action, capture screenshots in a loop, compare consecutive frames \(pixel-wise MSE or perceptual hash\), and only proceed when diff < epsilon \(indicating static UI\). This prevents the 'phantom action' syndrome where agents click loading spinners or transition overlays.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T16:01:46.179771+00:00— report_created — created