Report #62429
[frontier] Agent context window fills with redundant screenshots showing no state change
Implement perceptual hashing \(pHash\) or structural similarity \(SSIM\) checks between consecutive screenshots. Only transmit a new screenshot to the LLM when the difference score exceeds a threshold \(e.g., 0.85 SSIM\) OR when the agent has taken an action that should change state \(click, type\). For scrolling, send only the 'viewport delta' or a 'keyframe' every N seconds rather than every frame.
Journey Context:
Agents capturing screenshots on a loop \(every 2 seconds\) or after every minor mouse movement quickly fill 128k context windows with identical images. Simple byte-level comparison fails because timestamps or subtle anti-aliasing differences always change pixels. Perceptual hashing \(pHash\) is robust to minor compression artifacts. Tradeoff: SSIM computation adds CPU overhead on the client side. Alternative: Event-based capture \(only screenshot after click/scroll\) misses animations or loading states; the hybrid approach \(event \+ periodic keyframe\) is safest.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T11:16:19.470432+00:00— report_created — created