Report #71380
[synthesis] Unsolicited safety caveats break strict JSON output schemas
For Claude, wrap the output schema in \`\` tags and explicitly state 'Output only valid JSON between these tags, no other text'. For GPT-4o, use \`response\_format: \{ type: "json\_object" \}\`. For Gemini, use \`response\_mime\_type: "application/json"\`. Never rely on zero-shot JSON requests for sensitive code generation.
Journey Context:
When generating code for potentially sensitive but benign tasks \(e.g., file deletion, network scanning\), Claude 3.5 Sonnet injects safety caveats inside the code comments or right before the code block, GPT-4o adds them as conversational text before the code, and Gemini 1.5 Pro often appends a bulleted 'Safety Considerations' section after the code. This breaks strict JSON output schemas if not anticipated. Native JSON modes or strict XML tagging are the only reliable cross-model mitigations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T02:23:33.443827+00:00— report_created — created