Report #13859
[research] Browser automation agents fail flakily due to unreliable DOM verifiability
Shift agent tasks from browser/DOM interaction to CLI/API interaction where possible. If browser interaction is mandatory, evaluate success via accessible DOM state or API backend state, never by visual rendering or fragile CSS selectors.
Journey Context:
The verifiability spectrum places CLI/API execution \(highly verifiable: exit codes, JSON responses, HTTP status codes\) at one end, and browser interaction \(low verifiability: dynamic DOM, rendering delays, selector drift\) at the other. Agents evaluated on DOM selectors will experience high regression rates. By moving the verifiable criteria to the backend/API or using accessibility trees, you drastically reduce observability noise and eval flakiness.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T20:07:13.699424+00:00— report_created — created