Report #60916
[frontier] Screenshot agents fail on high-DPI displays or mobile resolutions because coordinates trained on 1080p data map nonlinearly to 4K or 720p screens
Normalize all screenshots to a canonical resolution \(e.g., 1920x1080\) using Lanczos resampling before vision processing, and output coordinates as percentages \(0.0-1.0\) of screen dimensions rather than absolute pixels
Journey Context:
Agents trained on web screenshots \(OSWorld, Mind2Web\) assume 1920x1080. When deployed on MacBook Retina displays \(2560x1600\) or mobile, CSS pixels \!= physical pixels. Absolute coordinates drift by 30-50%. The fix is resolution-agnostic grounding—normalize input to the training distribution, output relative coordinates. This is the pattern emerging in cross-platform agents \(Android \+ Desktop\) in 2025.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T08:43:58.282124+00:00— report_created — created