Report #94135
[frontier] Computer-use agents fail to detect subtle UI state changes \(checkbox states, toggle positions, loading spinners\) because screenshot pipelines use JPEG compression, destroying high-frequency spatial details
Mandate PNG or lossless WebP encoding for all UI automation screenshots; implement detail-preserving pyramidal encoding where navigation areas stay full-res while backgrounds compress; validate that anti-aliased 1px borders remain distinguishable
Journey Context:
Developers often default to JPEG for screenshot transmission bandwidth savings \(setting quality to 80 or 90\), not realizing that modern UI relies on 1-2 pixel borders, subtle shadows, and anti-aliased micro-text that JPEG quantizes away. This creates 'intermittent' failures where agents work on large buttons but fail on small toggles or can't distinguish between a checked and unchecked 16x16px checkbox. The fix isn't just 'use PNG' - it's about adaptive encoding based on UI element size detection and ensuring lossless encoding for regions under 50x50px. This is critical for reliable computer-use agents at scale.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:35:36.395719+00:00— report_created — created