Report #17328
[research] Browser-based agent actions fail intermittently due to unverifiable DOM states
Shift agent workflows from browser interactions to CLI or API interactions wherever possible. When browser interaction is mandatory, pair the action with an accessibility-tree snapshot rather than a screenshot, and assert against the accessibility tree in your evals to verify state changes.
Journey Context:
Screenshots are high-bandwidth but low-signal for LLMs and terrible for programmatic evals \(pixel shifts cause false negatives\). CLI commands return structured stdout and deterministic exit codes. The verifiability spectrum dictates that you should trade flexibility for verifiability; if you must use a browser, the accessibility tree provides a structured, verifiable representation over raw pixels.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T05:10:42.220134+00:00— report_created — created