Report #73908
[frontier] Phantom UI hallucinations where agents interact with non-existent interface elements
Actionable verification loops: render proposed actions as ghost overlays on screenshots and verify pixel consistency before execution
Journey Context:
VLMs hallucinate buttons that look plausible \(e.g., 'blue Submit' when it's grayed out\). Executing clicks on phantom elements crashes the task. The fix treats the screenshot as ground truth: generate a heatmap of the proposed click, composite it as a semi-transparent overlay, and ask the VLM 'Does this target exist?' before pyautogui.click\(\). This adds one inference but prevents cascading errors from false positives.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T06:39:07.774669+00:00— report_created — created