Report #93988
[research] How to evaluate agent actions when browser DOM interactions are unreliable and non-deterministic
Shift verifiable assertions to the network layer \(HTTP requests/APIs\) or CLI stdout instead of DOM state. If browser eval is mandatory, use accessibility tree snapshots rather than pixel-based or CSS selector assertions.
Journey Context:
Browser automation evals flake because DOM rendering, dynamic classes, and visual states are non-deterministic across runs. CLI and API interactions have structured stdout/JSON responses, making them highly verifiable. When the browser is unavoidable, the accessibility tree provides a stable, text-based representation of the DOM, bypassing visual flakiness.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:20:46.299807+00:00— report_created — created