Report #64041
[frontier] Computer-use agents fail on tasks requiring precise drag-and-drop or coordinate-based gestures that are easy for humans
Use accessibility APIs for semantic drag operations \(e.g., 'move item A to list B'\) rather than pixel-perfect mouse path simulation; fall back to vision only when semantic APIs fail
Journey Context:
Simulating mouse movements via screenshots \(x,y coordinates\) is brittle to window position, scaling, and animation timing. Modern accessibility APIs \(UI Automation on Windows, AX API on macOS, AccessibilityNodeInfo on Android\) expose semantic drag-and-drop operations that don't require precise coordinates. Leading agents prefer these semantic actions, using vision-based coordinate clicking only as a last resort for non-accessible applications.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T13:58:39.681988+00:00— report_created — created