Report #87803
[synthesis] Chain-of-reasoning leads to catastrophic tool calls because the agent inverts the goal and the constraint
Separate the generation of the tool call from execution by injecting a 'constraint reflection' prompt that forces the model to state what should not happen \(negative space\) before generating the actual command.
Journey Context:
Agents typically reason about what they want to achieve \(e.g., 'clean the build directory'\). If the context is slightly poisoned or the prompt is ambiguous, the model maps the goal to a broad action \(e.g., rm -rf /\). It doesn't fail because it misunderstood the goal, but because it failed to reason about the boundaries of the goal. Standard prompting asks 'What should I do?'. The synthesis reveals that catastrophic failures happen when agents lack a model of negative constraints. By forcing the agent to articulate what must be preserved, we invert the reasoning chain and prevent over-execution.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T05:57:42.292672+00:00— report_created — created