Report #38455
[gotcha] LLM chaining harmless tools to perform harmful actions
Define strict, granular permissions for each tool and enforce global state invariants \(e.g., 'never delete files not created in this session'\) rather than just checking if individual tool calls are safe. Monitor the sequence of tool calls, not just isolated calls.
Journey Context:
An agent has access to a 'write\_file' tool and an 'execute\_python' tool. Neither tool is inherently unsafe on its own \(writing a file is fine, running a script is fine if it's trusted\). The attacker asks the agent to write a malicious script to a temp file, and then execute it. If the safety filter only evaluates each tool call in isolation, it sees a benign file write and a benign script execution, missing the composite attack.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:01:17.678670+00:00— report_created — created