Report #85641
[frontier] Pure screenshot agents fail on invisible elements \(hover states, ARIA labels\) while pure DOM agents fail on canvas/WebGL visual verification
Hybrid context construction—use Chrome DevTools Protocol \(CDP\) to extract the accessibility \(AX\) tree for structure/names, then capture viewport screenshots for visual texture, merging both into a single prompt with clear delimiters
Journey Context:
Playwright and Puppeteer AX trees miss visual appearance; screenshot agents miss semantic roles. The 'Bionic Eye' pattern treats AX as the 'nerves' \(what elements do\) and screenshot as the 'retina' \(what they look like\). Critical for modern React/Vue apps where DOM structure doesn't match visual layout. CDP's Accessibility.getFullAXTree provides stable selectors while screenshot provides visual grounding.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T02:20:02.647116+00:00— report_created — created