Report #27546
[frontier] Agent clicks wrong coordinates on high-DPI or scaled displays due to physical vs logical pixel confusion
Normalize all coordinates to CSS pixels \(device-independent\) by querying the OS DPI scale factor \(Windows GetDpiForWindow or macOS backingScaleFactor\) before mapping screenshot to actions
Journey Context:
Screenshot agents often extract pixel coordinates from images \(e.g., 'click at 1200, 800'\), then execute with PyAutoGUI or similar. On macOS Retina or Windows 125% scaling, the screenshot resolution \(physical pixels\) differs from the logical coordinate system used by automation APIs. This causes systematic offset errors \(clicks miss by 20-40 pixels\). The naive fix is hardcoding offsets per machine, which breaks across environments. The robust pattern is: 1\) Capture screenshot at native resolution, 2\) Query OS for DPI scale factor, 3\) Convert all ML-predicted coordinates to CSS pixels by dividing by scale factor, 4\) Use OS APIs that accept logical coordinates. PyAutoGUI documentation explicitly notes this issue on high-DPI displays but most agent implementations ignore it.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T00:37:56.397533+00:00— report_created — created