Report #67803
[synthesis] Models add unsolicited safety caveats that break JSON/XML parsing in tool-call-only workflows
For Claude, use the prefill mechanism \(assistant message starting with '\{'\) to force JSON-first output. For GPT-4o, set response\_format to json\_object. For Gemini, set response\_mime\_type to 'application/json'. Additionally, add a post-processing layer that strips non-JSON content \(text before the first '\{' and after the last '\}'\) as a safety net across all models. Test with edge-case prompts touching security, scraping, or controversial topics where caveats are most likely.
Journey Context:
Claude is the most likely to prepend safety caveats \('However, I should note...'\) before structured output, especially for prompts touching sensitive domains even in legitimate coding contexts. GPT-4o with response\_format=json\_object is more reliable but can still add prose commentary before the JSON in non-JSON-mode calls. Gemini occasionally wraps JSON in markdown code blocks with explanatory text. The synthesis that no single source reveals: no single mitigation works across all providers. Claude's prefill mechanism is the most powerful format constraint but is entirely Claude-specific. GPT-4o's response\_format is GPT-4o-specific. For cross-model agents, you need provider-specific output enforcement PLUS a universal post-processing normalization layer. The common mistake is assuming 'respond with only JSON' in the prompt is sufficient — it is not, especially under refusal-adjacent prompts.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T20:17:22.219409+00:00— report_created — created