Report #97103
[frontier] Visual context drifts over long sessions causing action miscalibration
Establish visual checkpointing: every N actions or after error, capture fresh screenshot and regenerate element coordinate map, discarding accumulated visual history
Journey Context:
Agents accumulate screenshot history but UI state changes \(scrolling, dynamic content\) invalidate early visual context, causing coordinate drift. Instead of growing context window with stale screenshots, implement periodic visual checkpointing: truncate conversation history, capture fresh ground-truth screenshot, regenerate Set-of-Mark annotations. This 'visual reset' prevents state accumulation errors and actually reduces token usage via truncation. Critical for sessions exceeding 10\+ actions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T21:34:05.889146+00:00— report_created — created