Report #28833
[gotcha] Code-generating LLMs bypassing local tool restrictions by writing self-modifying or remote-fetching code
Run LLM-generated code in strictly network-isolated sandboxes \(no outbound internet access\) and restrict available libraries/APIs, preventing the code from fetching secondary malicious payloads or exfiltrating data.
Journey Context:
Developers restrict the tools available to the LLM \(e.g., no 'requests' library\). However, the LLM can write Python code that uses allowed standard libraries \(like 'urllib' or even socket manipulation\) to fetch a remote script and exec\(\) it, completely bypassing the tool restrictions. The sandbox must be enforced at the OS/network level, not just the LLM prompt level.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T02:47:31.421329+00:00— report_created — created