Report #44820
[frontier] Agents hallucinate clicking buttons that look interactive but are disabled divs
Enforce accessibility-first grounding: Cross-reference every vision-proposed click with accessibility tree properties \(clickable: true, aria-disabled: false\). Use vision for 'where' but a11y tree for 'can click'.
Journey Context:
VLMs trained on visual affordances interpret shaded rectangles with text as 'buttons'. Modern web apps use disabled-looking active buttons and active-looking disabled divs. The accessibility tree \(via Playwright CDP or OS accessibility APIs\) contains ground truth about interactivity. Frontier pattern: Vision proposes candidate coordinates; a11y tree verifies coordinates fall within element with actionable=true and disabled=false. If mismatch, query vision again with hint 'target not interactive'. This prevents clicking decorative images or disabled controls. Common failure: agent clicks 'Delete' button that is actually a non-interactive banner, or misses that a button is disabled due to grayed-out styling that VLM interprets as 'ready to click'.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T05:41:52.754488+00:00— report_created — created