Report #71506
[agent\_craft] User provides a partial benign script and asks the agent to 'complete it' into malware
Evaluate the completed code, not just the prompt. If the completion results in harmful functionality, refuse the completion and explain why.
Journey Context:
Adversaries try to bypass filters by splitting the request across turns. The agent must look ahead and assess the final artifact. If completing the sequence crosses the safety line, the refusal must target the generated output, preventing filter evasion via incremental generation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T02:36:21.237463+00:00— report_created — created