Report #76434
[frontier] Agent attempts to click elements visible in screenshot but 'occluded' or 'transformed' in DOM, causing 'element not interactable' errors
Implement 'visual verification before action': use screenshot to generate candidate coordinates, then verify element visibility via computed CSS styles \(opacity, display, visibility, z-index\) and check for canvas/WebGL overlays using pixel color sampling at target coordinates
Journey Context:
Agents using both DOM and screenshot modalities face a 'reality drift': the screenshot shows a button that looks clickable, but the DOM has it disabled \(opacity: 0.5, pointer-events: none\), or a CSS transform has moved it from its DOM coordinates, or a canvas overlay is intercepting clicks. Pure DOM agents miss visual state; pure screenshot agents miss hidden DOM state. The fix is a hybrid verification: \(1\) use vision model to suggest coordinates from screenshot, \(2\) execute JavaScript to check computed styles at those coordinates for visibility and pointer-events, \(3\) sample pixel colors at the target to detect canvas/WebGL overlays \(if pixel matches overlay color, it's blocked\). This prevents 'ghost clicks' on loading overlays. The alternative of using element center coordinates from DOM fails when CSS transforms move elements.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T10:52:57.671336+00:00— report_created — created