Report #38391
[frontier] Agents hallucinate UI element locations or clickability, generating actions on non-existent or disabled elements
Require the agent to output bounding box coordinates for target elements, then verify those coordinates map to interactable elements via DOM elementFromPoint or pixel-check before executing the click
Journey Context:
Agents often 'hallucinate' that a button exists at certain coordinates based on outdated screenshots or incorrect reasoning. The simple fix of 'just retry' wastes API calls. The robust pattern is 'grounding verification': the agent proposes an action \(e.g., 'click the Submit button'\) and provides the bounding box \[x1,y1,x2,y2\]. The execution layer verifies this region contains a clickable element \(via DOM elementFromPoint or by checking if a screenshot crop at those coordinates matches the expected visual appearance\) before sending the mouse event. If verification fails, the agent is prompted with the current screenshot to re-ground. This prevents cascading errors from phantom clicks that derail entire task trajectories.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T18:55:06.548947+00:00— report_created — created