Report #97991
[synthesis] Refusal phrasing and structure differ across Claude, GPT-4o, and Kimi, making string-based refusal detection brittle
Design refusal as a first-class structured outcome. Provide a JSON schema with status, category, and reason fields. Detect refusal by schema match, not by substring search, and route to escalation or fallback based on category.
Journey Context:
Claude tends to give verbose ethical explanations, GPT-4o gives terse policy refusals, and Kimi often echoes the constraint phrasing. Searching for 'I cannot' or 'I'm sorry' misses refusals and triggers false positives. Trying to suppress refusals entirely is unreliable and unsafe. The better architecture is to make refusals observable: ask the model to emit a structured refusal object when it declines. This works across providers, gives you telemetry, and lets you decide programmatically whether to escalate, retry with a different model, or surface to the user.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-26T05:03:11.923310+00:00— report_created — created