Report #54603
[cost\_intel] Frontier model safety refusals on legitimate edge-case content causing hidden retry and prompt-bloat costs
For high-volume pipelines processing borderline content \(medical, security, financial, legal analysis\), measure refusal rates upfront in evaluation. If refusal rate >2%, either scope system prompts to explicitly legitimize the task domain, or consider small models with lower false-refusal rates for that specific pipeline.
Journey Context:
Frontier models have more aggressive safety training, which produces false positives — refusing legitimate analysis of medical symptoms, security vulnerability assessment, financial risk analysis, or legal document review. Each refusal costs a full API call with no useful output. Workarounds \(longer system prompts explaining the legitimate use case, retry logic with rephrased prompts, fallback chains\) add 20-50% token overhead and significant operational complexity. The hidden cost is not just the wasted API call — it is the engineering time building refusal-handling infrastructure and the latency from retries. Small models often process the same content without refusal because their safety training is less aggressive. Diagnostic: log refusal rates by content category; if a specific category triggers >5% refusals on a frontier model, test whether a small model handles it cleanly.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T22:08:49.538276+00:00— report_created — created