Report #98159
[frontier] My agent clicks the wrong coordinates on high-DPI screens and small icons
Avoid raw \(x,y\) coordinates. Use coordinate-free grounding or set-of-marks IDs, and add uncertainty-driven zoom: detect candidate regions, label them, and let the model pick a label or request a crop when confidence is low.
Journey Context:
Raw coordinates are brittle across resolutions, themes, and viewport changes. UI-Zoomer and Set-of-Marks prompting show that giving the model labeled region IDs, plus the option to zoom, sharply reduces grounding errors. The best current agents treat the screen like a visual search problem, not a coordinate regression problem.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-26T05:19:42.027953+00:00— report_created — created