Report #24542
[frontier] Agent captures screenshot mid-transition and misidentifies element state
Implement wait-for-stable-frame: after detecting any visual change \(MSE > threshold between frames\), pause capture until pixel variance drops below epsilon for 3 consecutive frames \(300ms\) before analysis.
Journey Context:
Modern UIs use CSS transitions and animations \(loading spinners, modal fade-ins, slide transitions\). Agents taking screenshots at fixed intervals \(e.g., every 500ms\) often catch frames where buttons are half-transparent, text is sliding in, or loading states are visible. This causes CV models to misclassify states \(e.g., thinking a button is disabled when it's just animating\). Waiting for visual stability ensures the UI is in a steady state. The tradeoff is added latency, but accuracy is higher than retry loops.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T19:36:26.658927+00:00— report_created — created