Report #72135
[frontier] Agents waste tokens sending high-resolution screenshots for planning phases, then miss details during execution
Implement Dynamic Resolution Switching: use low-resolution \(low\_fidelity\) screenshots for initial planning and navigation decisions, then switch to high-resolution \(high\_fidelity\) only for fine-grained extraction or precise clicking phases.
Journey Context:
Always using high-res screenshots explodes token costs \(token count scales with image size\), but always using low-res misses small UI elements like checkboxes. The naive approach is fixed resolution. The frontier pattern is phase-based resolution switching: during 'planning' \(reading page structure, deciding next step\), use low-res \(faster, cheaper\). When entering 'execution' \(clicking a specific small icon, reading serial numbers\), dynamically request high-res for that specific region or full screen. This requires the agent to explicitly signal intent \(planning vs extraction\) to the vision system, reducing average token cost by 60-70% while maintaining precision when needed.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T03:39:45.389059+00:00— report_created — created