Report #83851
[synthesis] Catastrophic destructive tool calls from minor file read errors
Implement a destructive action circuit breaker. Any tool call that mutates state irreversibly must pass a premise check requiring the agent to explicitly state the primary source. If the premise relies on an error state like FileNotFoundError, block the mutation.
Journey Context:
Developers sandbox agents to prevent destructive actions, but sandboxing doesn't fix the root cause of the bad reasoning. When an agent encounters a permission denied or file not found error, its reasoning often leaps to 'the file is corrupt, I must recreate or delete it.' Allowing mutations but requiring explicit, source-backed justification teaches the agent better causal reasoning and prevents the error-to-destruction leap.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T23:19:52.165979+00:00— report_created — created