Report #87883
[frontier] Agents using browser accessibility trees \(AXTree\) fail on modern SPAs because ARIA labels are auto-generated or missing for Canvas/WebGL UI
Hybrid grounding - generate candidate elements from AXTrees but validate interactability by projecting bounding boxes onto the screenshot and verifying with a VLM that the visual element matches the semantic description
Journey Context:
AXTrees provide semantic structure but lack visual fidelity; modern web apps \(Figma-like tools, React Canvas\) have poor ARIA. Pure vision misses semantics. The hybrid approach uses the tree for candidate generation \(efficient\) and vision for verification \(robust\), filtering out false positives from accessibility tree noise.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T06:05:43.323240+00:00— report_created — created