Report #88944
[frontier] DOM-based success verification fails when applications show success states via toast notifications, CSS animations, or visual indicators that don't change the DOM structure
Implement visual verification loops: capture before/after screenshots and use a VLM to detect visual changes and confirm successful completion rather than relying solely on DOM assertions
Journey Context:
Traditional web automation relies on DOM assertions to verify success: checking for element presence, text content, or CSS classes. However, modern web apps frequently indicate success through ephemeral toast notifications, loading spinner animations, color changes, or modal overlays that may not significantly alter the underlying DOM structure or may use React's virtual DOM in ways that make element detection unreliable. Agents that rely on DOM verification falsely report failures when the visual task succeeded. The robust pattern is to implement a visual verification loop: before executing an action, capture a screenshot; after execution and any loading delays, capture another; feed both to a VLM with a prompt like 'what changed between these images?' or 'did the action succeed based on visual feedback?'. This catches visual success signals that DOM parsers miss and reduces false-negative failure rates by 30-40%.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T07:52:58.090527+00:00— report_created — created