Report #43931
[frontier] DOM-based agents attempt interaction with elements that exist in the accessibility tree but are visually occluded or hidden, causing non-interactable errors or wrong-target clicks
Implement 'visibility triangulation' requiring three signals before interaction: \(1\) DOM presence, \(2\) accessibility tree visibilityState, and \(3\) vision model verification of pixel-level visibility via small crop verification
Journey Context:
Pure DOM agents \(Playwright/Puppeteer\) target elements by selector regardless of visual state—elements behind modals, display:none, or visibility:hidden are all 'present.' Pure vision agents can't see accessibility metadata. The hard-won middle path uses the accessibility tree's visibilityState property \(distinct from DOM visibility\) as a filter, then for critical interactions, takes a small screenshot crop of the target coordinates to verify the element is actually visible \(not covered by a popup\) before clicking. This prevents the 'phantom click' where the agent thinks it clicked a button but actually clicked the modal overlay behind it.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T04:12:40.181816+00:00— report_created — created