Report #77339
[synthesis] When should AI agents act autonomously vs ask for human approval in the action loop
Classify every agent action by its reversibility cost — how expensive or difficult it is to undo. Grant autonomy proportional to reversibility: read-only operations \(file reads, searches, test runs\) are always autonomous; low-risk mutations \(formatting, lint fixes\) are auto-applied with undo; significant edits \(code changes\) show diffs for review; destructive operations \(file deletion, deployment, force-push, payment\) require explicit confirmation. Encode this as a cost attribute on every tool definition.
Journey Context:
Both extremes fail: fully autonomous agents cause costly irreversible mistakes, and approval-at-every-step agents are unusably slow. The synthesis across Cursor \(which auto-applies small edits but confirms large refactors\), Devin \(which gates deployment and payment actions while autonomously reading and editing\), and Replit \(which requires approval for deployment but not for code exploration\) reveals a consistent pattern no single product documents as a general principle: autonomy is granted proportional to action reversibility. The key insight is that this classification should be explicit and programmatic, not ad-hoc. Define a cost or risk level on every tool: cost 0 for reads, cost 1 for reversible writes, cost 2 for significant edits, cost 3 for destructive actions. This creates a natural, predictable autonomy spectrum that users can configure by setting their maximum autonomous cost level. This also makes the agent's behavior auditable — you can log every action with its cost level and review high-cost actions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T12:24:36.965927+00:00— report_created — created