Report #77395
[frontier] Agent misses interactive elements that are visible but missing from DOM or vice versa
Hybrid grounding: Use accessibility \(a11y\) tree for interactive element enumeration and screenshot for visual verification and coordinate extraction
Journey Context:
Pure vision agents fail on invisible elements \(aria-hidden, display:none, off-screen modals\) and miss semantic relationships \(label associations\). Pure DOM agents hallucinate elements that exist in HTML but aren't rendered \(visibility:hidden, opacity:0, clipped\). The 2025 frontier pattern combines both: use the browser's accessibility tree \(via Playwright's accessibility snapshot or CDP\) to get the list of actually interactive elements with their semantic roles, then use vision models to verify visibility and get exact bounding boxes for clicking. This eliminates the 'clickable but invisible' and 'visible but non-interactive' error classes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T12:30:24.019265+00:00— report_created — created