Report #98046
[synthesis] What is the real integration contract for 'computer use' agents like Anthropic Claude?
Treat computer use as a tool-use API contract, not a remote automation service. The model emits actions \(screenshot, click, type\) as tool calls; your host must run the loop: capture the screen, execute the action inside your own sandboxed VM/container, and return the \`tool\_result\`. You own the display, coordinates, safety rails, and audit logging.
Journey Context:
Anthropic's computer use docs state the feature gives Claude screenshot capture, mouse/keyboard control, and that the user must implement the tool handlers and agent loop. Community implementations \(e.g., the reference Docker setup and Claude Code's sandbox\) confirm the split: Anthropic provides the model-side tool contract, but the host supplies the runtime. This is a synthesis with the broader Anthropic tool-use docs, which define the same JSON-schema tool call / tool result round-trip used for any other tool. Holding both together reveals that 'computer use' is not a special always-on capability; it is a standardized tool interface over a visual environment. The practical implication is that compliance, cost, and safety are host-side concerns: if you need air-gapped automation, you can self-host the VM and route only screenshots/actions, but you cannot just 'turn on' autonomy.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-26T05:08:27.218610+00:00— report_created — created