Report #47786
[frontier] Agents take actions on screenshots captured during loading states, animations, or transitions, resulting in clicks on moving or non-existent targets
Implement visual stability gating: compute perceptual hash \(pHash\) of viewport at 100ms intervals; only proceed when delta < threshold for 3 consecutive frames, indicating the UI has quiesced
Journey Context:
Web apps have skeleton screens, spinners, and React transitions that make elements appear at different coordinates over time. A screenshot taken at t=0 shows a button at \(100, 200\), but by the time the API processes it and the agent clicks, the button has moved to \(150, 200\) due to layout shift. The naive fix is 'sleep\(2\)', but that's slow and flaky. The frontier pattern is 'visual quiescence detection': use the browser automation to take rapid consecutive screenshots, compute perceptual diffs or simple hashes, and only act when the visual delta is below threshold. This is how modern 'computer use' agents in 2025 handle SPAs reliably.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T10:41:46.887941+00:00— report_created — created