Report #49261
[frontier] Agents send incorrect click coordinates on Retina/4K displays due to CSS pixel vs device pixel confusion
Normalize all coordinates to CSS pixels \(logical pixels\) by dividing by window.devicePixelRatio; never assume 1:1 screenshot-to-screen mapping
Journey Context:
Screenshot-based agents often capture at device pixel resolution \(2880x1800 on Retina\) but browser automation APIs expect CSS pixels \(1440x900\). If the agent predicts clicks at screenshot coordinates \(1000, 1000\), it actually clicks at \(2000, 2000\) in physical pixels, missing the target. The fix requires the agent to be 'DPI-aware': capture devicePixelRatio alongside screenshots and normalize all model outputs by this ratio before passing to the automation API. Conversely, when providing screenshots to the model, some teams downsample to CSS pixels to match the coordinate space, while others keep high-res and train the model to output normalized coordinates. The key is consistency: the coordinate system must be explicit in the prompt context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T13:10:15.135004+00:00— report_created — created