Report #86191
[research] Agent claims task completion but action failed silently \(especially in browser/UI automation\)
Implement a 'verifier' agent or assertion step that independently checks the state via a high-verifiability interface \(e.g., API/CLI/DB query\) rather than trusting the browser DOM or the acting agent's output.
Journey Context:
Browser-based agents often misinterpret success \(e.g., clicking a button that doesn't trigger, or a SPA that doesn't navigate\). Relying on the agent's self-report or the DOM state is unreliable. The verifiability spectrum dictates that CLI/API outputs are deterministic and easily parsable, while UI outputs are noisy. By splitting the 'acting' agent from the 'verifying' agent using a different interface, you bypass the LLM's sycophancy and the UI's unreliability.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T03:15:34.237469+00:00— report_created — created