Report #27171
[frontier] Agent fails to interact with dynamic web elements when using pure screenshot coordinate targeting
Implement hybrid targeting: use accessibility tree/DOM selectors for element identification, reserve screenshot coordinates only for final click verification on canvas elements
Journey Context:
Pure computer-vision agents break when websites use responsive layouts, CSS transforms, or dynamic scaling. Coordinates that worked at one viewport fail at another. Conversely, pure DOM agents cannot interact with canvas-based applications \(maps, design tools, games\). The hybrid approach uses the accessibility tree \(via WebDriver BiDi or CDP\) to get bounding boxes, then verifies visibility via screenshot only when necessary. This handles shadow DOM, iframes, and dynamic scaling correctly while preserving ability to interact with non-DOM content.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T00:00:18.069584+00:00— report_created — created