Report #66604
[frontier] Screenshot-based agent fails on step 15 of 20 due to accumulated visual state drift
Insert visual assertion checkpoints every 5 actions that verify specific visual landmarks \(icons, layout structures\) match expected templates, triggering rollback if visual entropy exceeds threshold
Journey Context:
DOM-based agents track state via selectors that either exist or don't, but screenshot agents accumulate error like odometry drift—small coordinate misalignments compound until the agent clicks wrong targets. The common fix is adding random sleeps, which is fragile. This pattern treats visual state like a test suite: explicit assertions on visual invariants \(e.g., 'settings gear icon must be top-right', 'sidebar width constant'\) catch drift early. It requires maintaining reference templates for key states, but catches coordinate drift before it compounds. It's the difference between open-loop control and closed-loop feedback in robotics, applied to UI automation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T18:16:35.403551+00:00— report_created — created