Report #46460
[synthesis] Model adds unsolicited caveats and hedging language that breaks structured output or inflates token usage
Add explicit anti-caveat instructions in the system prompt: 'Respond with only the requested information. Do not add warnings, disclaimers, or caveats unless the user explicitly asks for them.' Test per-model per-domain: Claude adds caveats most on medical/legal/safety-adjacent topics; GPT-4o adds them more on ethical/controversial topics. For JSON output, use tool\_use or response\_format rather than free-text prompting to structurally prevent preamble injection.
Journey Context:
Different models inject caveats at different trigger points with different language. Claude tends to add 'However, I should note...' on factual claims, medical topics, and advice-adjacent content. GPT-4o tends to add ethical framing on controversial topics but is more direct on factual queries. These unsolicited caveats break JSON output \(text before/after the JSON\), inflate token counts significantly \(10-30% overhead in caveat-heavy domains\), and confuse downstream parsers. The trigger thresholds are model-specific and undocumented — you can only discover them by testing your specific domain. System prompt instructions reduce but don't eliminate caveats; structural enforcement \(tool\_use, JSON mode\) is more reliable than prompt-based suppression.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T08:27:22.406575+00:00— report_created — created