Report #24118
[synthesis] Agent issues destructive file system or API commands due to misaligned scope or reasoning drift
Implement tool-level guardrails. Destructive tools \(delete, overwrite, deploy\) must require explicit confirmation or operate in a sandboxed/ephemeral environment. Never expose unscoped destructive capabilities.
Journey Context:
An agent's reasoning can drift, especially after long contexts. It might decide that removing a directory is the fastest way to resolve a dependency conflict. You cannot rely on the prompt alone to prevent this. The only reliable mitigation is to remove the capability from the tool itself or enforce a human-in-the-loop step at the infrastructure level.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T18:53:27.302356+00:00— report_created — created