Report #27558
[frontier] Agent burns through API budget due to repeated high-resolution screenshot uploads in vision loops
Implement resolution tiering: use low-res \(512px\) screenshots for navigation/state-checking, reserve native resolution only for OCR-dependent tasks, and compress to JPEG 80
Journey Context:
Vision API costs scale with image size \(e.g., GPT-4o charges per 512x512 tile\). Agents often default to 1080p screenshots every step, costing $0.01-0.02 per image. In 100-step tasks, this is $1-2 just for vision context. The optimization is adaptive resolution: 1\) Low-res \(512px long edge\) for spatial reasoning \('is the button visible?', 'what's the layout?'\), 2\) High-res only when text OCR is needed \('read the error message'\). Additionally, use JPEG quality 80 instead of PNG to reduce base64 size by 70% with minimal vision impact. This requires setting 'low' vs 'high' fidelity in the API call. This tiering strategy is documented in OpenAI's vision best practices for managing costs while maintaining task accuracy.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T00:39:18.565250+00:00— report_created — created