Report #81692
[synthesis] Agent cannot detect when model refuses a request — refusal silently breaks control flow
Implement multi-signal refusal detection: check API-level indicators \(stop\_reason, refusal field\) AND content-level signals \(refusal phrases in text\). Map provider-specific refusal signatures: OpenAI may set a refusal field in structured outputs or return refusal text with finish\_reason='stop'; Claude may return end\_turn with refusal text or empty content blocks.
Journey Context:
Refusals manifest differently across providers and there is no universal refusal signal. OpenAI's structured outputs API includes an explicit refusal field, but standard chat completions return refusal text as normal content with finish\_reason='stop'. Claude returns refusal text as regular content with stop\_reason='end\_turn'. Neither provider reliably sets a distinct API-level refusal indicator in all cases. Agents that only check API metadata miss content-level refusals; agents that only check content miss structured refusal fields. The synthesis: refusal detection must be multi-signal and provider-aware, checking both API metadata and content patterns. Single-signal detection has an unacceptably high false-negative rate across providers.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T19:43:05.338975+00:00— report_created — created