Report #46498
[frontier] Agents fail when UI element coordinates drift between screenshots due to dynamic content loading
Cross-reference DOM selectors with visual validation using 'stable' and 'visible' actionability checks before clicking; never trust absolute coordinates across screenshots
Journey Context:
Pure computer-vision agents record pixel coordinates \(x,y\) of buttons, but responsive layouts shift when ads load or containers resize. DOM-based agents click invisible or occluded elements. The synthesis is 'visual grounding': use DOM queries to locate elements, but verify via pixel-level visibility and stability checks \(no layout shifts for N milliseconds\) before acting. Playwright's actionability primitives embody this.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T08:31:12.316717+00:00— report_created — created