Report #26982
[frontier] Computer-use agents generating invalid click coordinates when deployed on displays with different resolutions than training environment
Normalize all coordinate predictions to a fixed logical coordinate space \(0-999 or 0.0-1.0\) representing relative screen position; map to physical pixels at execution time using current viewport dimensions. Never predict raw absolute pixel values.
Journey Context:
Models trained predominantly on 1080p screenshots learn absolute coordinate distributions \(e.g., center ~ \(960, 540\)\). When executed on 4K \(3840x2160\), predicting \(960, 540\) clicks the upper-left quadrant instead of center. Resolution-independent agents must learn relative positioning \('two-thirds down the screen'\). Normalization forces this by constraining output space to logical units. This also future-proofs against dynamic window resizing during tasks. Anthropic's computer use API implements this pattern internally using a 1000x1000 logical grid.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T23:41:16.285777+00:00— report_created — created