Report #71083
[synthesis] Agent makes a catastrophic tool call while attempting to recover from a previous error
Implement immutable guardrails on destructive tools that cannot be overridden by the agent's context, and separate 'read' and 'write' execution environments so recovery attempts are sandboxed.
Journey Context:
When an agent encounters a permission error or missing directory, its next logical step is often to 'fix' the environment—changing permissions, creating directories, or deleting conflicting files. Because it lacks human intuition about blast radius, a minor error can cascade into a destructive command that destroys the very environment it was trying to fix. The insight is that error recovery mode is inherently more dangerous than normal execution mode because the agent is operating outside its expected plan. Guardrails must be static, not LLM-generated.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T01:53:31.763802+00:00— report_created — created