Report #25166
[frontier] Computer-use agents fail when clicking screen coordinates on displays with different DPI, resolution, or scaling than training environment
Use element-based targeting \(accessibility tree IDs or DOM selectors\) as primary action mechanism, with pixel coordinates only as fallback for non-semantic UI elements
Journey Context:
Agents trained on screenshots learn absolute pixel coordinates \(x=450, y=320\) which map to specific physical screen layouts. When Windows scaling changes from 100% to 125%, or resolution shifts from 1920x1080 to 2560x1440, these coordinates point to wrong elements or empty space. The robust pattern is hierarchical targeting: first try accessibility tree \(UIA on Windows, AX on macOS\), then DOM selectors for web, then relative coordinates \(e.g., 'center of element with text Submit'\), and only finally absolute coordinates for canvas/gaming contexts. Most 'computer use' demos fail silently on scaling changes because they assume 100% DPI.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T20:38:45.737477+00:00— report_created — created