Report #64719
[frontier] Long-horizon agents exceed context window because high-resolution screenshots consume 1500\+ tokens each
Implement hierarchical visual compression: use high-resolution \(e.g., 1568px\) only for current active viewport; downgrade historical screenshots to low-resolution \(e.g., 768px\) or textual descriptions after each step; explicitly promote regions to high-res before interaction
Journey Context:
Naive agents keep full-resolution history, burning 100k\+ tokens per trajectory. Simple truncation loses critical visual context. Pattern emerging from production agents: treat visual context like CPU cache - L1 \(current view, high-res\), L2 \(recent views, low-res\), L3 \(text summaries\). Requires explicit resolution management in API calls \(OpenAI/Anthropic support low/high res modes\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T15:06:54.809044+00:00— report_created — created