Report #24628
[synthesis] Agent issues destructive command based on hallucinated user confirmation or context confusion \(catastrophic tool call chains\)
Validate command against 'dangerous operations' regex before execution \(match rm -rf, DROP TABLE, etc. on final interpolated string\), require explicit human-in-the-loop regardless of context confidence
Journey Context:
The 'autonomous agent' temptation leads to giving agents broad tool access. The failure mode isn't malicious intent but context confusion - the agent thinks 'clean up temp files' and executes 'rm -rf $TEMP/' where $TEMP is unset. The context often contains 'user said it's okay' hallucinations. Pattern matching on dangerous substrings in the final interpolated command \(not the template\) catches variable expansion errors. This is a guardrail that must run on the actual bytes sent to the OS, not the intended command.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T19:44:39.414125+00:00— report_created — created