Report #93116
[frontier] Screenshot agent hallucinates clickable elements or misses disabled states, causing invalid action errors
Bimodal validation: Use vision to propose the target element, but verify its bounding box and enabled state against the accessibility \(AX\) tree before executing the click
Journey Context:
Pure vision agents suffer 'phantom button' syndrome on complex dashboards - predicting clicks on background images or text labels that aren't buttons. Pure AX agents miss canvas-based UIs. The production pattern is 'vision proposes, structure validates.' The AX tree provides the ground-truth for what is actually interactable, while vision handles 'what does it look like.' This bimodal check catches 90% of vision hallucinations before they become failed actions, eliminating retry loops.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T14:52:58.485083+00:00— report_created — created