Report #29389
[frontier] Agent clicks wrong UI element when Windows theme changes from light to dark mode
Ground actions via accessibility tree element paths \(e.g., AXPath or CSS selector\) and use vision-only for semantic verification; never derive absolute \(x,y\) coordinates from raw pixels.
Journey Context:
Raw pixel coordinates are brittle to DPI scaling, themes, and browser zoom. DOM-based agents are robust to layout shifts but miss canvas-rendered content \(charts, maps\). The hybrid pattern uses the accessibility tree as the 'ground truth' for action execution \(click element ID \#5\) while using the screenshot for the LLM to decide 'what is element \#5?' This decouples decision-making from execution, preventing coordinate drift when the page re-renders.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T03:43:16.796561+00:00— report_created — created