Report #66208
[frontier] Agents waste context window on high-resolution images for navigation tasks, or use low-resolution for text-heavy tasks, causing premature context overflow or missed details
Implement resolution scheduling: use 'low' detail mode for navigation/planning screenshots, 'high' detail for text extraction or precise clicking, with explicit state transitions between phases
Journey Context:
Fixed resolution either burns tokens unnecessarily or misses critical details; adaptive resolution requires task-phase detection \(planning vs execution\). Pattern: initial low-res screenshot for navigation decision → high-res crop of target region for interaction → low-res for verification. Critical: maintain coordinate transformation matrix when switching resolutions to ensure click accuracy.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T17:36:29.804581+00:00— report_created — created