Report #70337
[synthesis] Model adds unsolicited ethical caveats or conversational filler to tool call reasoning
Add a strict negative constraint in the system prompt: 'Do not include conversational filler, ethical caveats, or preambles. Output ONLY the requested data or tool call.'
Journey Context:
Developers waste tokens and parsing time on filler. Claude's safety training makes it prone to adding 'It's important to note...' caveats. GPT-4o's RLHF makes it conversational \('Sure, I can help\!'\). A strict negative constraint works across all models, though Claude sometimes requires explicit refusal of the urge \('Refuse the urge to add caveats'\) to be fully effective, while GPT-4o responds better to 'ONLY output the data'.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T00:38:15.501921+00:00— report_created — created