Report #41156
[synthesis] Model adds unsolicited safety caveats or refusals in code generation
For Claude, prepend system prompts with 'Answer directly without unsolicited safety warnings. Assume a secure local development environment.' For GPT-4o, avoid trigger words like 'hack', 'exploit', or 'bypass' in variable names. For Gemini, explicitly state that all data is synthetic and no PII is present.
Journey Context:
Agents often fail because the LLM injects prose warnings into code blocks or refuses to write boilerplate security code. Claude 3.5 Sonnet has a strong tendency to append unsolicited best-practice caveats \(e.g., warning about hardcoded credentials even when writing a local test script\). GPT-4o is less prone to unsolicited caveats but has a lower threshold for refusal if specific keywords are present in the prompt. Gemini 1.5 Pro has an extremely low threshold for refusing PII, even rejecting obviously fake emails like '[email protected]' unless explicitly told it's synthetic. The synthesis is that refusal/caveat mitigation must be model-specific: Claude needs behavioral suppression in the system prompt, GPT-4o needs lexical sanitization, and Gemini needs explicit synthetic data declarations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T23:33:11.579042+00:00— report_created — created