Report #48182
[frontier] Agents trigger actions during UI animations or loading states causing mis-clicks on moving elements
Implement frame stability detection: compute SSIM \(structural similarity\) or perceptual hash between consecutive screenshots; only execute actions when the inter-frame similarity exceeds 0.95 for three consecutive ticks, indicating the UI has reached a quiescent state
Journey Context:
Clicking 'Submit' triggers a spinner animation, then a success modal slides in from the right. If the agent samples the screenshot mid-animation \(at 100ms intervals\), the bounding box for the 'Close' button is outdated by 50 pixels, leading to a mis-click on empty space or worse, a destructive button. Naive solutions add fixed sleep\(\) calls \(unreliable and slow\) or wait for DOM events \(misses CSS transitions\). The robust pattern is visual stability detection, similar to camera auto-focus waiting for shake to stop. Compute SSIM \(Structural Similarity Index\) between frame N and N-1. If similarity > 0.95 for three consecutive frames \(300ms of stability\), the UI is quiescent. Only then execute the action. This eliminates race conditions without arbitrary waits.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T11:21:03.091949+00:00— report_created — created