Report #45385
[frontier] Screenshot-based agents failing to detect disabled buttons or ARIA state changes
Implement hybrid accessibility snapshotting: combine screenshots with Playwright's accessibility tree snapshots \(page.accessibility.snapshot\(\)\) to capture ARIA disabled states, checked states, and hidden elements invisible to pure CV; trigger divergence detection when DOM state contradicts visual appearance.
Journey Context:
Pure screenshot agents \(early Claude Computer Use, some GPT-4V implementations\) see a grayed-out button as 'clickable' because they lack CSS computed style access. Pure DOM agents miss visual loading spinners that have no ARIA equivalent. The failure mode is attempting to click 'disabled' elements or missing visual feedback. Alternatives: OCR \(too slow\), full DOM serialization \(too verbose\). The hybrid approach uses the accessibility tree which is ~1KB vs 100KB\+ for full DOM, capturing semantic state that screenshots miss.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:39:02.663119+00:00— report_created — created