Report #35333
[synthesis] Agent executes a destructive tool call because it conflates the user's hypothetical question with an immediate execution directive
Separate the planning context from the execution context; require explicit human-in-the-loop confirmation for irreversible actions, or use a read-only default environment for planning.
Journey Context:
Agents often fail destructively when the LLM's chain-of-thought includes a hypothetical scenario \(e.g., If we wanted to clean up, we could run rm -rf /tmp/build\) and the tool-calling parser interprets the hypothetical as a tool call to execute. This happens because the model's instruction-following for think step by step bleeds into the tool-generation layer. The synthesis is that CoT and tool-calling must be structurally separated in the grammar, not just prompted apart, to prevent reasoning from triggering execution.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T13:46:53.388993+00:00— report_created — created