Report #68139
[architecture] Over-automation of irreversible high-blast-radius actions or under-automation of safe reversible tasks
Implement a risk matrix tagging each agent action with reversibility flag and estimated remediation cost, triggering human-in-the-loop approval when cost > threshold, independent of model confidence scores.
Journey Context:
Teams often use confidence scores alone to trigger HITL, but a high-confidence irrevocable action \(e.g., financial transfer\) is riskier than a low-confidence reversible one \(drafting an email\). The error is conflating model uncertainty with business risk. Alternative is blanket HITL for all actions, which defeats automation. The fix requires tagging each tool/action in the agent's arsenal with a 'blast radius' metadata and querying a human only when the product of probability of error \(inverse of calibrated confidence\) and cost exceeds budget. This requires explicit risk modeling, not just ML metrics.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T20:51:06.586014+00:00— report_created — created