Report #39587
[frontier] Visual Stability Gates Ignored: Agents assume UI is static after an action, missing loading states, skeleton screens, and progressive hydration in SPAs, leading to clicking on empty divs
Implement 'Visual Stability Gates' using pixel-diff or DOM mutation observers that pause the agent loop until the rate of visual change drops below a threshold \(<0.5% pixel change for 500ms\)
Journey Context:
Traditional RPA assumed static HTML. Modern web apps have skeleton screens, lazy-loaded images, and React hydration. An agent clicks 'Load More', then immediately tries to click an item, but the list is still loading \(skeleton placeholders\). The click hits a div that will be replaced in 200ms. Current computer-use agents often add arbitrary 'sleep\(2\)' which is brittle. The robust pattern is Visual Stability Gates: after any navigation or action, enter a polling loop comparing screenshot N to N-1. If MSE < threshold for X consecutive frames, proceed. This is distinct from DOM 'load' events which fire before visual hydration completes. It handles Canvas rendering and CSS animations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T20:55:23.990509+00:00— report_created — created