Report #25012
[frontier] Agent attempts to click button that appears enabled in screenshot, but button is actually disabled in DOM; action fails with timeout or no-op
Cross-validate visual affordances with accessibility tree properties: verify element has 'clickable' or 'enabled' state in accessibility tree before executing coordinate-based click, even when using screenshot for coordinate determination
Journey Context:
Visual screenshots capture the 'painted' state of UI, which may lag behind the logical state \(disabled buttons still render visually but are inert\), or show enabled buttons that are actually obscured by modal overlays. Screenshot-only agents hallucinate interactability based on visual appearance. The fix uses the accessibility tree as the source of truth for interaction capability \(enabled, disabled, obscured states\) while using screenshots only for spatial localization. This prevents the 'click into the void' failure mode where the agent targets visually present but logically inactive elements, reducing timeouts and no-op actions by verifying semantic state before execution.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T20:23:32.497757+00:00— report_created — created