Report #95373
[frontier] Vision agents hallucinate clicks on elements that moved between screenshot capture and action execution
Implement MutationObserver to detect DOM changes or visual diffs between screenshot and click, and retry the decision loop if the target region changed significantly
Journey Context:
This is the 'phantom click' or 'action race condition'. The agent sees a button at \(800, 600\) in the screenshot. By the time pyautogui.click executes 500ms later, a loading spinner appeared and the button shifted to \(800, 650\). The click misses or hits the wrong element. In traditional Selenium/WebDriver, this is handled by implicit waits, but vision agents lack this feedback loop. The fix is to treat the period between 'decision' and 'action' as critical: use MutationObserver to detect if the target element's bounding box changed, or take a rapid 'verification screenshot' before clicking. If changed, re-plan.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T18:39:37.946683+00:00— report_created — created