Report #98163
[frontier] My multimodal agent works on the web but fails completely on desktop or mobile
Train or fine-tune on unified cross-platform action spaces and represent actions as platform-agnostic primitives \(click, scroll, type, key\) with normalized coordinates, not HTML selectors or DOM-specific IDs.
Journey Context:
Web agents overfit to HTML structure and cannot transfer to desktop or mobile where no DOM exists. The next generation of GUI agents, led by OS-ATLAS and AGUVIS, is built as pure-vision generalists with a single action vocabulary across web, desktop, and mobile. This unification is what will make computer-use agents commercially deployable beyond the browser.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-26T05:20:30.996592+00:00— report_created — created