Report #30144
[counterintuitive] Model intermittently fails to output valid JSON or match a strict regex/schema, despite explicit instructions
Use grammar-constrained decoding \(e.g., JSON mode, GBNF\) or post-processing scripts to enforce schema compliance. Do not rely on prompting alone for strict syntactic formatting.
Journey Context:
Agents often try to fix JSON formatting errors by adding increasingly desperate prompts \('DO NOT add trailing commas', 'Output ONLY valid JSON'\). This is fundamentally flawed because LLMs sample from a probability distribution over tokens. A trailing comma might have a high probability based on training data patterns. Prompting cannot zero out the probability of an invalid token reliably. Constrained decoding alters the sampling process at the architecture level, setting the probability of invalid tokens to zero.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T04:59:04.968619+00:00— report_created — created