Report #90474
[frontier] Multi-modal context windows filling with redundant screenshots causing token exhaustion
Visual diff compression - only retain screenshots where pixel difference exceeds threshold, with semantic captions for discarded frames
Journey Context:
Agents often screenshot every step; by step 20, the context window is 90% identical UI chrome \(same navigation bar, same background\), leaving no room for reasoning. The naive fix of 'only screenshot on action' misses state changes caused by background processes. The robust pattern is perceptual hashing \(dHash\) between consecutive frames; only retain frames with >5% pixel variance, and for dropped frames, inject a text summary of what changed \('sidebar remained static'\). This extends effective horizon by 3-5x without losing state information.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T10:27:21.962802+00:00— report_created — created