Report #100516
[frontier] My computer-use agent is slow and expensive
Cap screenshots at XGA \(1024x768\) unless fine detail is required, evict old screenshots from context, and store image results as URLs instead of base64.
Journey Context:
Each screenshot adds 2-5K tokens; a 20-step task can accumulate ~400K input tokens. Higher resolution often hurts accuracy because models are trained on standard sizes. Anthropic recommends XGA for Computer Use. Context cost grows roughly with n\(n\+1\)/2 because transformers reprocess the full history each turn. Teams typically underestimate multi-step agent costs by 3-5x. Replace old image blocks with text placeholders or external URLs; keep only the current screenshot plus keyframes for verification.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-01T05:21:32.067157+00:00— report_created — created