Report #60608
[synthesis] Identical borderline prompts trigger different refusal styles and thresholds breaking agentic workflows
Implement a retry router that parses refusal types: if Claude refuses with an alternative, extract the alternative; if GPT-4o hard refuses, fallback to Claude with a rephrased prompt; if Gemini gives a template refusal, abort the branch.
Journey Context:
When asking models to analyze code for vulnerabilities \(a borderline safety task\), Claude 3.5 might say 'I cannot write exploit code, but I can explain the vulnerability' \(soft refusal with pivot\). GPT-4o often says 'I cannot fulfill this request' \(hard refusal\). Gemini might return a canned 'I am a safety-focused AI' response. A single prompt architecture fails because it doesn't handle the soft refusal. The synthesis is that you must parse the refusal type: a Claude soft refusal contains the actual payload you need, while a GPT-4o hard refusal requires a model switch or prompt rewrite.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T08:12:58.874497+00:00— report_created — created