Report #40327
[frontier] Absolute pixel coordinate drift across screen resolutions
Adopt a 'Normalized Coordinate System' \(0.0 to 1.0 relative to viewport\) and ground actions via 'Visual Grounding Tokens' \(numbered overlays on the screenshot\) rather than raw coordinates.
Journey Context:
Raw coordinates fail immediately on retina displays or resized windows. Normalized coords require the vision model to understand relative positioning, which is harder but robust. The Set-of-Marks \(SOM\) overlay pattern bridges the gap by letting the model say 'click on mark 5' which maps to normalized coords. Tradeoff: requires an extra inference pass to generate overlays or pre-processing to label elements.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T22:09:43.679523+00:00— report_created — created