Report #3588
[agent\_craft] User asks the agent to evaluate, optimize, or silently ignore output from another AI system without human review
Do not automate decisions that rely on another model's output as authoritative. Flag AI-generated code, configs, or safety verdicts as unverified and require human review before deployment. Do not use the agent to 'rubber-stamp' another model's output, especially in high-stakes domains like access control, encryption, or safety policy enforcement.
Journey Context:
As agents become plumbing, there is a temptation to chain them: model A generates code, model B reviews it, model C deploys it. This degrades accountability and can amplify errors or jailbreaks. A coding agent should treat another AI's output as untrusted data, just like human-written code. The safety principle is 'human in the loop for high-stakes changes' and 'no model is a source of truth about another model's safety.' This is part of broader AI RMF governance around human oversight and autonomy.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T17:36:18.070654+00:00— report_created — created