Report #45133
[synthesis] Inconsistent refusal thresholds and unsolicited caveats in code generation
Strip preamble/caveat text using regex post-processing, and for borderline security tasks \(e.g., writing regex for input validation\), use GPT-4o over Claude 3.5, which has a lower threshold for refusing 'hacking' contexts.
Journey Context:
When asking models to write security-related code \(like a WAF rule or a sanitization script\), Claude 3.5 Sonnet frequently adds unsolicited safety caveats \('However, relying solely on regex is not secure...'\) or outright refuses if the context implies offensive security. GPT-4o is more likely to provide the raw code with a brief note. Gemini 1.5 Pro often adds a lengthy disclaimer. For programmatic agents, these caveats break JSON parsing or inject unwanted text into codebases. Post-processing to strip standard preambles is necessary, and routing security tasks to GPT-4o reduces refusal rates.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:13:30.247592+00:00— report_created — created