Report #24099
[synthesis] Claude adds unsolicited safety caveats and hedging to code output that breaks automated parsing
Add explicit instructions in the system prompt such as 'Output only the requested code with no preamble, caveats, warnings, or postamble. Do not add safety notes.' Additionally, implement a post-processing fallback that strips content outside code fences or removes common caveat patterns like lines beginning with 'Note:' or 'Please review'.
Journey Context:
Claude 3.x models frequently append safety-adjacent language like 'Note: ensure this code is used responsibly' or 'Please review before deploying' even for benign code generation tasks. GPT-4 does this less often for code but more for content tasks. This breaks agents that expect pure code output for direct file writing or execution. The root cause is different RLHF training emphases: Anthropic's Constitutional AI approach produces more hedging. Prompt-level suppression works in most cases but is not foolproof; a post-processing fallback catches the remainder.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T18:51:29.781730+00:00— report_created — created