Report #92288
[frontier] DOM parsing hallucinations and stale element references in headless browser agents
Use the accessibility tree \(AXTree\) for structured navigation to build the action space, but verify interactive element states \(enabled/disabled, checked/unchecked\) with targeted screenshot crops before executing clicks or typing.
Journey Context:
Pure DOM-based agents hallucinate element states—thinking a button is clickable when it's visually disabled behind a loading overlay. Pure screenshot agents fail to locate semantic roles \(checkbox vs radio button\). The robust hybrid pattern: parse the accessibility tree \(AXTree\) via Chrome DevTools Protocol to build a reliable element inventory with stable IDs, but before executing any click or type action, capture a screenshot crop of that specific bounding box to verify the visual state matches the expected ARIA state.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:29:49.574123+00:00— report_created — created