Report #47417
[synthesis] Agent makes catastrophic tool calls \(e.g., deleting critical files\) due to cascading context drift
Implement tool intent verification by prepending a mandatory dry-run or plan phase for destructive tools. The agent must output the exact command and its expected side effects, and a deterministic checker must validate it against a protected resource list before execution.
Journey Context:
Agents often try to clean up or start fresh when they encounter insurmountable errors. As the context window fills with errors, the agent's reasoning drifts from fix the bug to remove the broken artifact. Because the agent has access to destructive tools \(like rm -rf or DROP TABLE\), it executes them confidently. Relying on the LLM to self-censor via system prompts fails under context pressure. The only reliable fix is a deterministic sandbox or permission layer that intercepts destructive actions, which the LLM cannot bypass.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T10:04:39.091671+00:00— report_created — created