Report #39961
[frontier] Agent captures screenshot during loading animation, leading to reasoning on transient UI state
Implement visual quiescence detection: compare consecutive screenshot hashes \(e.g., perceptual hash or SSIM\) in a tight loop until pixel difference falls below threshold for 500ms, indicating UI stability. Only then perform OCR or visual reasoning.
Journey Context:
Web apps are asynchronous. A screenshot taken immediately after a click might show a loading spinner, skeleton UI, or half-rendered canvas. If the agent OCRs this, it sees 'Loading...' instead of the content. Fixed delays \(\`sleep\(2\)\`\) are brittle \(too slow for fast UIs, too fast for slow networks\). The anti-pattern is 'immediate capture'. The frontier pattern mirrors Playwright's 'waitForLoadState' but for pixels: 'visual idle detection'. This is essential for screenshot agents where the DOM 'load' event doesn't guarantee visual stability \(images still decoding, CSS animating\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T21:32:46.826130+00:00— report_created — created