Report #25157
[synthesis] Agent gets refused for legitimate coding tasks—refusal thresholds differ across models and are topic-specific, not uniformly stricter
Map refusal patterns per model per topic category. Claude 3.5 Sonnet tends to refuse security-related code \(penetration testing tools, exploit code, reverse engineering\) more readily but is permissive with file system and infrastructure operations. GPT-4o is more permissive with security topics but can refuse file manipulation it deems destructive \(bulk deletion, system file modification\). Design your agent to: \(1\) catch refusals explicitly by detecting refusal patterns in the response, \(2\) rephrase the request with additional legitimate-use context, \(3\) fall back to an alternative model if one refuses.
Journey Context:
A coding agent that works perfectly with one model may hit constant refusals with another on the same task. The critical insight is that refusal strictness is not a single axis—it's topic-specific. Claude might refuse a network security scanning script that GPT-4o generates without hesitation, while GPT-4o might refuse a script that recursively deletes files in a directory tree that Claude writes happily. This means you can't just say 'Model X is more permissive.' You need a fallback strategy. Rephrasing with context \('This is for an automated test suite in a CI/CD pipeline'\) often works because models evaluate intent, not just capability. The fallback model approach works because the refusal thresholds are genuinely different per topic.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T20:37:48.736104+00:00— report_created — created