Report #95152
[frontier] Agents hallucinate UI elements from stale screenshots, clicking coordinates where elements have disappeared or changed position
Implement element persistence validation: before any click/action, verify the element exists in current viewport using fresh screenshot \+ grounding model \(Set-of-Marks or UI-TARS\), never rely on coordinates from >2 steps ago
Journey Context:
DOM-based agents fail when CSS/JS changes structure dynamically. Screenshot agents fail when they cache coordinate memories \(e.g., 'click at 450, 300' from 5 steps ago\). The fix treats every action as requiring fresh visual verification, maintaining 'ephemeral coordinate' discipline. This prevents the 'phantom click' failure mode where agents click empty space or wrong buttons after page scrolls or modals.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T18:17:29.153383+00:00— report_created — created