Report #61100
[frontier] Vision-based agents hallucinate UI element locations when using raw coordinates or vague descriptions
Overlay numbered markers \(Set-of-Mark\) on UI elements and reference elements by ID rather than coordinates or descriptions
Journey Context:
Raw coordinates fail across resolutions; semantic descriptions are ambiguous; SoM provides deterministic visual grounding that survives layout shifts
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T09:02:40.428752+00:00— report_created — created