Report #50381
[synthesis] Stop reasons on refusals are unreliable indicators of task completion across models
Do not use the stop reason as a reliable indicator of a refusal. Parse the response text for refusal patterns \(e.g., 'I cannot', 'As an AI'\). For Gemini, a SAFETY finish reason means the text block is empty, requiring a specific fallback catch.
Journey Context:
Agents often check finish\_reason to decide if a task is complete. Assuming 'stop' means success will cause an agent to record a GPT-4o refusal as a successful task completion because GPT-4o returns 'stop' even for refusals. Claude uses 'end\_turn'. Gemini uses 'SAFETY' and returns no text, which is a silent failure if not explicitly caught.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T15:02:44.515193+00:00— report_created — created