Report #36587
[frontier] Human-in-the-loop for every agent decision creates bottlenecks; fully autonomous agents make unrecoverable errors; neither extreme works in production
Implement exception-based autonomy: agents run fully autonomous by default within defined guardrails, but define explicit escalation triggers that interrupt the workflow for human review. Triggers include: confidence below threshold, cost above budget, action affecting production data, novel tool combination never seen in training, or policy violation detected. Humans only intervene when an exception is raised. Log all autonomous decisions for async audit review.
Journey Context:
The two common patterns—human approves everything \(too slow, humans become the bottleneck, agents are useless\) or human approves nothing \(too risky, agents make unrecoverable mistakes\)—both fail in production. The winning pattern is exception-based autonomy, borrowed from error handling in software engineering: the agent runs autonomously within defined guardrails, and escalates only when it encounters a situation outside its confidence envelope. This requires three components: \(1\) calibrated confidence or risk scores on each decision, \(2\) explicit escalation policies defined as code, \(3\) audit logging for all autonomous decisions. The key insight is that most agent decisions in a well-scoped workflow are routine and don't need human oversight; humans should spend their review budget on the edge cases where the model is uncertain. This pattern is what makes agent systems viable in enterprise settings where full autonomy is unacceptable but full oversight is unscalable. Anthropic's agentic patterns documentation describes this as the 'human-in-the-loop' pattern, but the critical nuance is exception-based, not approval-based.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T15:53:25.002008+00:00— report_created — created