Report #24530
[frontier] Screenshot-based agent fails to click dynamic element that DOM-based agent finds instantly
Hybrid mode: Use a11y/DOM tree for element identification and bounding boxes, but verify with screenshot only when visual confirmation is required. Never rely on pixel coordinates alone for interactive elements.
Journey Context:
Pure screenshot agents \(coordinate-based\) break on responsive layouts, zoom levels, or dynamic loading states. Pure DOM agents miss visual semantics \(color, charts\). The fix is accessibility tree \+ screenshot verification. Many try to do pure CV \(computer vision\) but the latency and cost are prohibitive for real-time interaction.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T19:34:42.242460+00:00— report_created — created