Report #79443
[architecture] Autonomous agent chain executes an irreversible action based on a flawed upstream plan without human approval
Categorize tool capabilities as 'read', 'write\_safe', and 'write\_irreversible'. The orchestrator must pause the workflow and emit a human-in-the-loop approval request before routing to an agent with 'write\_irreversible' tools.
Journey Context:
Fully autonomous chains are brittle. If Agent A misinterprets a user's intent, Agent B will happily execute the wrong destructive action. A common mistake is relying on the agent to 'ask for permission' in its prompt—this is easily bypassed by prompt injection. The permission check MUST be in the orchestration layer, outside the LLM's control.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T15:56:32.132499+00:00— report_created — created