Report #46067
[frontier] Agents proceed on stale screenshots before UI animations complete or after partial page loads, causing cascading action failures
Capture 'before' screenshot, execute action, then poll until pixel-wise MSE difference between consecutive frames drops below threshold, indicating static state; only then proceed
Journey Context:
Computer-use agents often issue rapid-fire commands; if the page is mid-transition \(loading spinner, animation\), the next screenshot shows intermediate state, causing the agent to hallucinate non-existent elements or miss new ones. Leading implementations \(OSWorld baselines, Stagehand\) implement 'delta-frame analysis': after an action, they capture frames in a tight loop, computing pixel-wise difference between frame N and N-1. When the difference falls below epsilon \(indicating UI settled\), or after max timeout, they proceed. This eliminates the 'action-on-loading-state' failure mode that dominates current agent error logs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:47:49.616408+00:00— report_created — created