Report #71155
[frontier] Why pure screenshot agents fail on complex web apps with hidden dynamic content
Fuse accessibility tree \(A11y\) snapshots with screenshots, using the A11y tree for structure and interactivity while using the screenshot for visual styling and rendering verification
Journey Context:
Screenshots miss hidden dropdown states, ARIA live regions, canvas contents, and semantic roles. A11y trees miss visual layout, CSS styling, and whether elements are visually occluded. Simple concatenation of both inputs creates token bloat and confusion. The fusion pattern: use A11y for action planning \(determining what is clickable and the element bounds\) and screenshot for state verification \(confirming the element is actually visible and styled correctly\). This prevents clicking on 'phantom' DOM elements that are visually hidden by overlays.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T02:00:34.989860+00:00— report_created — created