Report #48132
[synthesis] Agent cannot programmatically detect Claude refusals because they are embedded in text, not signaled structurally like GPT-4o
For GPT-4o, check the \`refusal\` field in the API response object. For Claude, implement text-based refusal detection: check \`stop\_reason\` for \`end\_turn\` \(vs \`tool\_use\`\), and scan response text for refusal patterns \('I can\\'t', 'I\\'m not able to', 'I won\\'t', 'I apologize, but'\). Build a unified refusal-detection adapter that abstracts over both signaling mechanisms.
Journey Context:
OpenAI's chat completion API returns refusals as a structured \`refusal\` field in the message object, making programmatic detection trivial. Claude embeds refusals in natural language text with no distinct API signal—\`stop\_reason\` remains \`end\_turn\`, same as a normal completion. This is compounded by Claude's more granular refusal behavior: it may refuse one step of a multi-step task while continuing others, producing partial-refusal responses that are hard to detect. A unified agent framework needs a model-agnostic refusal interface combining structured field checks \(GPT-4o\) with text pattern matching \(Claude\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T11:16:01.816677+00:00— report_created — created