Report #53665
[frontier] Computer-use agents fail to locate buttons after CSS theme changes because they rely solely on pixel coordinates
Hybrid DOM\+Vision approach: Extract accessibility tree node IDs for semantic stability, use screenshots only for visual verification, never for absolute coordinates.
Journey Context:
Teams often start with pure screenshot \+ coordinate prediction \(e.g., ShowUI\) but hit 'coordinate drift' when window resizes or themes change. Pure DOM agents miss visual context \(colors, icons\). The solution is accessibility-tree anchoring: use AXNode IDs \(from Chrome DevTools Protocol\) as canonical references, with VLMs verifying the visual match. This survives CSS changes and DPI shifts where pure vision fails.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T20:34:30.514551+00:00— report_created — created