Report #92491
[frontier] Agents choose visual UI actions when API calls are available, wasting tokens and increasing fragility
Implement Modality-Agnostic Cost Modeling: query available APIs \(MCP/REST\) first, calculate cost/latency score for API vs UI path, default to API unless UI path is >10x faster
Journey Context:
Computer-use agents default to clicking and typing because it mimics human interaction. But APIs are deterministic and fast. Problem: Agents lack cost model for action space. Solution: Tool router layer. Before planning, enumerate available tools: MCP servers, REST endpoints, browser actions. Score each: API \(low latency, high reliability, limited scope\) vs UI \(high latency, fragile, universal\). Use API for data retrieval, UI only for actions requiring visual verification \(captcha, proprietary web apps\). Implement fallback: if API fails, retry with UI.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:50:17.922928+00:00— report_created — created