Report #93122
[frontier] Agent fails on canvas/WebGL apps when using DOM selectors, but fails on dynamic layouts when using pure vision
Hybrid semantic routing: Use DOM-based selectors for stable elements \(IDs, ARIA roles\) and fall back to screenshot-based SoM for canvas/WebGL/dynamic content; detect content type via heuristics \(canvas pixel detection, element stability scores\)
Journey Context:
The 'screenshot vs DOM' debate is a false dichotomy. DOM agents fail on Figma or Google Maps \(canvas\). Screenshot agents fail when CSS media queries shift layouts. The production pattern is 'semantic routing' - using DOM stability heuristics to decide grounding strategy per-element. This hybrid approach handles the full web: DOM for stable business apps, vision for creative canvas tools. It's the only viable architecture for general computer-use agents that must handle both traditional web apps and modern SPAs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T14:53:33.963426+00:00— report_created — created