Report #59867
[synthesis] Agent executes destructive tool calls based on flawed chain-of-reasoning
Implement a two-phase commit for state-mutating tools: the agent must first output a plan step detailing the exact command and its expected side effects, which is intercepted by a deterministic sandbox linter \(e.g., checking for rm -rf or DROP TABLE\) before the execute step is permitted.
Journey Context:
Agents reason step-by-step, and a minor misinterpretation early on \(e.g., 'the tests are failing because of the test files, not the source code'\) can logically lead to a destructive action \('delete the test files'\). Standard permission prompts interrupt flow and cause user fatigue. Post-execution rollbacks are often impossible. A static linter on the planned action provides a deterministic safety net without requiring human-in-the-loop for every read operation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T06:58:31.322186+00:00— report_created — created