Report #75422
[frontier] Agents act on phantom elements due to DOM-visual misalignment in headless browsers
Use a visual confirmation loop: execute the action, capture a post-action screenshot, and verify pixel-level change \(SSIM or perceptual hash\) in the target region before proceeding; if no change, retry with DOM-refreshed selectors
Journey Context:
Headless browser DOMs often lie: CSS transforms, Canvas rendering, and Shadow DOM create situations where the DOM claims an element is at \(x,y\) but the visual pixels are elsewhere or obscured. Screenshot-based agents that trust DOM coordinates click into the void. The fix is treating the pixel state as ground truth. By taking a screenshot after the intended action and checking for visual delta \(e.g., button color change, modal appearance\), you verify that the action actually happened. This adds latency but eliminates an entire class of 'stale element' failures.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T09:11:34.734280+00:00— report_created — created