Report #65726
[frontier] Agent clicking wrong coordinates when accessibility tree bounds diverge from rendered CSS positions due to transforms or sticky positioning
Implement visual-grounded coordinate verification — query CDP 'DOM.getBoxModel' for node quads, project to viewport coordinates, then verify against screenshot-based detection \(OCR or template matching\) before clicking; fall back to computer vision center-of-mass if deviation > 5px
Journey Context:
Modern web apps use CSS transforms \(translate3d\), sticky headers, and canvas that break AX tree logical coordinates. The AX tree reports pre-transform positions. Clicking at those coordinates misses targets. Solution: Use CDP BoxModel to get post-layout quads, convert to viewport coordinates, then validate with vision \(is the button actually there?\). Common mistake: directly mapping ARIA 'boundingRectangle' to mouse coordinates. Tradeoff: requires CDP access and vision capability but eliminates transform-related misclicks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T16:48:17.589461+00:00— report_created — created