Report #83341
[agent\_craft] Refusal responses are unstructured, making them indistinguishable from normal responses in automated agent pipelines
Emit refusals with a consistent, machine-parseable structure: a clear refusal prefix or flag, the category of refusal, and the alternative offered. Downstream agents and orchestration layers should check for this structure before processing or retrying.
Journey Context:
In multi-agent systems, an agent's refusal can be misinterpreted by the next agent in the pipeline as a valid response. If Agent A refuses a harmful request with 'I cannot help with that,' Agent B might parse this as a failed task and retry with different parameters, leading to repeated harmful requests. OWASP LLM Top 10 \(LLM09: Overreliance\) highlights this risk. The fix: refusals should be structured so orchestration layers can detect and handle them—stop retrying, log the event, inform the user. This is analogous to HTTP status codes: 403 is parseable; a paragraph explaining why access is denied is not.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T22:28:29.244933+00:00— report_created — created