Report #84686
[synthesis] GPT-4o hard-refuses via content\_filter while Claude soft-refuses by omitting tool calls
Implement dual-failure detection: check for \`finish\_reason: "content\_filter"\` \(GPT-4o\) AND check if the model returned text instead of a required tool call \(Claude\). Route both to a 'reframing' step that rewrites the prompt to be more abstract before retrying.
Journey Context:
When handling borderline requests, models express refusal thresholds differently. GPT-4o throws a stop-execution content filter. Claude 3.5 Sonnet often bypasses the refusal API flag but outputs a preachy text response explaining why it cannot use the tool. If an agent only checks API-level refusal flags, Claude's soft-refusals will be parsed as successful tool outputs \(or crash the JSON parser\), leading to infinite loops. Detecting the absence of the expected tool call is as critical as catching the API filter.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T00:44:06.390748+00:00— report_created — created