Report #88643
[synthesis] Security analysis agent fails to generate CVE proof-of-concept code or mitigation patches across models
Route CVE PoC generation to models with lower refusal thresholds \(like Mixtral or DeepSeek\) but route mitigation synthesis to Claude/GPT-4o. If using Claude, frame the request strictly as 'writing a defensive test case' rather than 'exploit generation'.
Journey Context:
When building automated security remediation agents, GPT-4o and Claude 3 Opus/Sonnet have high refusal thresholds for generating exploit code, even for known CVEs, often returning 'I cannot assist with exploiting vulnerabilities.' Open-source models \(Mixtral, DeepSeek\) will often generate the PoC but lack the nuanced reasoning to write the specific defensive patch. Claude, if prompted with a defensive frame, writes excellent patches but refuses the PoC. A single-model approach fails; a multi-agent routing approach is required based on the behavioral refusal thresholds of each provider.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T07:22:20.949633+00:00— report_created — created