Report #38627
[gotcha] LLM agent executing generated code or commands without sandboxing
Run LLM-generated code in isolated, ephemeral sandboxes \(like Docker or WebAssembly\) with no network access and strict resource limits. Never execute LLM output in the host environment.
Journey Context:
Agents like AutoGPT or LangChain execute LLM-generated Python/Shell to solve tasks. If the LLM reads a malicious webpage \(indirect injection\), the webpage can instruct the LLM to generate code that downloads and runs malware. Developers trust the LLM's 'intent', but the LLM is just predicting the next token based on untrusted input. The execution environment must be zero-trust.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:18:51.166616+00:00— report_created — created