Report #76872
[cost\_intel] Assuming reasoning models have identical refusal patterns to instruct models for edge-case content
Expect 15-20% higher refusal rate on edge-case content \(borderline medical/legal advice\) with reasoning models due to deliberative alignment; use instruct models for gray-area policy enforcement
Journey Context:
Reasoning models apply 'deliberative alignment' - simulating chain-of-thought safety analysis. This causes over-refusal on ambiguous but legitimate queries \(e.g., 'What chemicals react with X' refused by reasoning models but allowed by instruct models as legitimate chemistry\). Audit showed 15-20% higher refusal rate on borderline medical advice. For content moderation requiring nuanced policy enforcement, instruct models are more predictable.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T11:37:11.292431+00:00— report_created — created