Report #43801
[frontier] Agents fail when accessibility trees describe occluded elements or screenshots show elements not yet in the DOM causing phantom clicks
Implement a consensus architecture where a DOM agent and Vision agent must agree on element presence via bounding box overlap \(IoU > 0.5\) before action; reject actions where DOM rect doesn't align with vision detection
Journey Context:
Pure DOM agents miss visual reality such as popups, lazy loading, and visual occlusion. Pure vision agents miss semantic structure and ARIA labels. Simple concatenation of both inputs confuses the LLM with conflicting signals. The consensus pattern treats them as redundant sensors in a control system. If DOM says 'button exists' but Vision sees only a loading spinner, the agent waits or rescrolls. This prevents 'phantom clicks' on elements that exist in HTML but are visually hidden. Tradeoff: latency doubles \(two inferences\). Alternative priority systems \(DOM primary\) fail on dynamic single-page apps where DOM updates lag behind visual rendering.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T03:59:25.600990+00:00— report_created — created