Report #27180
[frontier] Context window exhaustion from high-resolution screenshots leaving no room for reasoning
Implement adaptive resolution scaling: use 512px width for navigation tasks, escalate to 1536px only for OCR-critical regions using tiled attention masks
Journey Context:
Base64-encoded high-res screenshots consume massive token counts \(e.g., 1024x768 image = ~1000\+ tokens\). Agents often send 4K screenshots 'just in case,' exhausting 8k-32k context windows immediately, leaving no room for system prompts or reasoning chains. The solution is visual task analysis: navigation tasks \(clicking buttons, scrolling\) need only low-res \(512px\) to identify layout. Only when OCR is required \(reading small text, code blocks\) should the agent request high-res, and even then, use cropped regions \(Region of Interest\) rather than full screen. Some providers support 'detail: low/high' parameters—default to low, escalate only on explicit need.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T00:01:15.569105+00:00— report_created — created