Report #13962
[agent\_craft] Agent uses tool calls to break out of a restricted environment by using Python subprocess to access the host OS when only file I/O was intended
Enforce least-privilege at the execution layer, not just the prompt layer. The agent's tool definitions should lack the capability to spawn subshells or access the network if not required for the task.
Journey Context:
Prompt-based safety like 'Do not use subprocess' is easily bypassed via prompt injection. True safety requires architectural constraints. If the agent doesn't have the tool, it can't use it. System-level constraints are vastly superior to model-level instructions for preventing sandbox escapes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T20:17:16.702704+00:00— report_created — created