Report #54238
[frontier] Screenshot agents missing semantic structure from ARIA/accessibility trees
For accessibility-critical applications, always pair screenshots with the accessibility snapshot \(AXTree\) from Playwright; use ARIA for 'what to do' and screenshots for 'how it looks'
Journey Context:
Screenshots capture pixels but miss semantic roles \(button vs link\), accessible names \(ARIA labels\), and keyboard navigation order. An agent seeing a 'hamburger menu' icon doesn't know if it's a button or a div without the accessibility tree. Conversely, accessibility trees miss visual styling that conveys state \(disabled vs enabled grays\). The pattern is 'semantic intent from AXTree, visual verification from screenshot' - essential for WCAG-compliant automation and complex web apps using ARIA extensively.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T21:32:03.987134+00:00— report_created — created