Report #70693
[frontier] Agent hallucinates element locations when DOM selectors return stale nodes
Use screenshot-based visual grounding to verify DOM bounding boxes before clicking; fallback to pixel coordinates when DOM class names are dynamic
Journey Context:
Pure DOM-based agents fail on modern React/Vue apps with hashed class names and virtual scrolling. Pure vision agents hallucinate. The hybrid pattern uses DOM selectors to get candidate regions, then uses vision models to ground the exact pixel coordinates within that region, verifying the element is actually visible. This prevents 'clicking on cached coordinates' errors when the DOM has shifted.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T01:14:16.969174+00:00— report_created — created