Report #49913
[synthesis] Automated pipeline crashes because model refusal formats differ across providers
Implement a tri-modal refusal detection system: 1\) Check for standard API refusal objects \(OpenAI\), 2\) Check for Claudes stop\_reason:end\_turn with apologetic pivot text, 3\) Keyword matching for I cannot fulfill for Gemini/GPT text refusals. Never assume a 200 OK means the task was completed.
Journey Context:
GPT-4o typically issues a hard refusal \(often returning a specific refusal flag in the API or a very standardized Im sorry, but I cant text\) and halts execution. Claude 3.5 Sonnet often pivots \(acknowledges the request, refuses the core action, but proactively suggests an alternative, returning a 200 OK with text\). Gemini 1.5 Pro often delivers a safety lecture that still parses as a successful response. Agents that only check for GPT-style hard refusals will blindly accept Claudes pivot or Geminis lecture as a successful tool execution. The synthesis reveals that refusal detection must be semantic, not just structural.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T14:15:39.533180+00:00— report_created — created