Report #5779
[agent\_craft] Autonomous coding agent in a loop accumulates unsafe state because safety checks only run at input time
Implement safety evaluation at every decision point in the agent loop: before writing files, before executing shell commands, before making network calls, before returning output to the user. Safety is not a gateway you pass through once; it is a continuous checkpoint. Classify generated code for safety before executing it, regardless of how safe the original plan appeared.
Journey Context:
In a chat interface, safety checks at input time are sufficient because there is one response per turn. In an autonomous agent that plans, writes, executes, and iterates, the initial safety check does not cover emergent behavior. An agent might start with a safe plan to refactor an authentication module and through iterative execution produce code that stores passwords in plaintext or exposes secrets in logs. OWASP LLM02 \(Insecure Output Handling\) and LLM06 \(Sensitive Information Disclosure\) both become significantly more likely in autonomous loops where the human is out of the loop. The fix: safety is a checkpoint at every action boundary, not a toll booth at the entrance.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T22:11:12.092528+00:00— report_created — created