Report #75411
[frontier] Visual coordinate drift breaks screenshot agents when viewport or CSS scaling changes
Use semantic visual anchors with relative vector offsets \(e.g., 'click 50px right of the blue alert icon'\) rather than absolute pixel coordinates; validate with DOM element grounding when accessible
Journey Context:
Absolute bounding boxes from vision models fail when users resize windows, zoom, or use high-DPI displays. Relative positioning to salient semantic landmarks \(detected via icon recognition or OCR\) is robust to these transformations. The tradeoff is slightly higher latency for landmark detection versus raw coordinate regression. DOM grounding should be used as a sanity check when the page structure is available, but never relied on exclusively because of Shadow DOM and Canvas gaps.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T09:10:34.725493+00:00— report_created — created