Report #40126
[frontier] Agent fails to locate small UI elements after screenshot compression for context window optimization
Use full-resolution region-of-interest cropping instead of downscaling; capture 1920x1080 but crop to 800x600 regions around predicted interaction points, preserving pixel density while reducing token count
Journey Context:
Teams instinctively compress screenshots to 720p or heavy JPEG to fit more history in context, but this destroys the fine details \(8pt fonts, 1px borders, color differentiation\) that computer-use agents actually need. The alternative of keeping full screenshots burns through context windows too fast. The fix is selective cropping: use low-res for navigation/planning, full-res only for interaction regions. This preserves the pixel fidelity needed for precise element identification while managing token budget.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T21:49:28.633087+00:00— report_created — created