Report #66209
[frontier] Sending full screenshots every step wastes tokens and context window, causing agents to lose track of long-term state in lengthy tasks exceeding 20\+ steps
Transmit only changed regions \(visual diffs\) between steps, with periodic full-screenshot 'keyframes' every N steps for grounding, using perceptual hashing to detect change regions
Journey Context:
Full screenshots cause context overflow in long episodes; pure DOM diffs miss visual feedback; visual diffing \(perceptual hashing or pixel difference\) identifies changed bounding boxes. Implementation: send cropped images of changed regions with coordinate offsets. Tradeoff: requires client-side compute for diffing; risk of missing subtle state changes \(e.g., color changes indicating disabled state\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T17:36:37.436117+00:00— report_created — created