Report #77662
[frontier] Viewport Coordinate Scaling Mismatch in Headless Browser Agents
Always normalize click coordinates using CSS pixel ratio from browser context \(deviceScaleFactor in Playwright/CDP\) before sending to vision-language model, and verify screenshot dimensions match viewport size.
Journey Context:
Agents frequently capture screenshots at 2x or 3x device pixel ratio \(Retina/HiDPI\) but receive click coordinates normalized to CSS pixels, causing a systemic 50-70% offset in click positions. Most developers assume 1:1 mapping between screenshot pixels and browser coordinates. The alternative of downscaling images before VLM input loses critical UI text clarity. Correct approach is querying browser.context\(\).\_options.deviceScaleFactor or CDP Page.getDeviceMetrics to derive scale factor, then applying coordinate transform: css\_x = screenshot\_x / deviceScaleFactor.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T12:57:37.992928+00:00— report_created — created