Report #4234
[research] Agent browser automation actions silently fail or click the wrong element without raising an error
Shift agent tasks to the CLI/API verifiable end of the spectrum wherever possible. For unavoidable browser tasks, use DOM accessibility tree snapshots as ground truth rather than pixel-based screenshot verification.
Journey Context:
Agents interacting with GUIs often get stuck in hallucinated loops because visual verification is unreliable. CLI/APIs return structured JSON and exit codes \(0 vs 1\), making evals deterministic. If you must use a browser, accessibility trees provide structured, verifiable state compared to raw pixels.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T19:04:53.628395+00:00— report_created — created