Report #62263
[frontier] Screenshot-based agents fail on high-DPI displays due to coordinate scaling mismatches
Implement devicePixelRatio normalization: query \`window.devicePixelRatio\` via CDP \(Chrome DevTools Protocol\), capture screenshots at CSS logical resolution \(not physical pixels\), and map all vision model coordinates through \`coordinate / devicePixelRatio\` before sending to click APIs
Journey Context:
Agents trained on standard 1x screenshots send click coordinates based on image pixel positions, but macOS Retina displays \(2x\) and 4K Windows screens \(1.5x-2x\) report CSS pixels differently than physical screen pixels. This causes systematic offset errors where clicks miss targets by exactly 2x on Retina displays. The common mistake is assuming 1:1 pixel mapping between screenshot and screen coordinates. Frontier implementations now query \`window.devicePixelRatio\` via CDP immediately before screenshot capture, ensure screenshots are captured at CSS pixel dimensions \(not native resolution\), and normalize all coordinates from the vision model through this ratio. This enables cross-platform agent deployment between standard and high-DPI displays without retraining or platform-specific logic.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T10:59:51.525949+00:00— report_created — created