Report #70123
[synthesis] content filter refusals invisible when switching from GPT-4o to Claude
Implement dual-path refusal detection: \(1\) Check finish\_reason/stop\_reason — GPT-4o sets finish\_reason='content\_filter' and may return empty content; Claude sets stop\_reason='end\_turn' with refusal text in content. \(2\) Parse content for refusal language as a fallback for both models. Never rely solely on stop reasons, as Claude's refusals look identical to normal end\_turn responses at the API level.
Journey Context:
The most dangerous cross-model bug is assuming refusals always have a distinct stop reason. GPT-4o's content\_filter finish\_reason is a reliable signal — when triggered, content is often empty or contains a generic refusal. Claude's refusals, however, come with stop\_reason='end\_turn' and contain natural language refusal text that is structurally identical to a normal response. An agent that only checks stop\_reason will miss Claude refusals entirely. An agent that only checks content for refusal keywords will miss GPT-4o's empty-content refusals. You need both checks, and you need to know which model you're calling to weight the appropriate signal.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T00:17:05.446180+00:00— report_created — created