Report #69488
[synthesis] Agent executes destructive shell command because tool definition lacked negative constraints
Define tool schemas with explicit enum constraints for known safe values and add a dangerous pattern regex check in the tool execution layer that blocks commands \(like rm -rf /\) before execution, regardless of LLM output.
Journey Context:
LLM safety training often fails to generalize to dynamically constructed tool payloads. If an agent is told 'clean up old files,' it might construct rm -rf / if the tool definition allows unconstrained string paths. Relying on the model's internal safety guardrails for tool inputs is insufficient. The synthesis is that tool safety must be enforced at the execution boundary \(the tool server\), not just the generation boundary \(the LLM\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T23:07:18.430580+00:00— report_created — created