Report #35868
[tooling] LLM producing invalid JSON or requiring expensive regex post-processing in local inference
Use GBNF grammar files with the \`--grammar-file\` flag \(or \`--grammar\` inline\) to constrain token generation to valid JSON at the sampling level, using the canonical \`json.gbnf\` from llama.cpp examples to guarantee output validity without token waste.
Journey Context:
Without grammar constraints, models generate invalid JSON \(~10-30% of the time\), requiring wasteful retry loops or fragile regex repair. GBNF grammars pre-filter the logits at each step to only valid next tokens, guaranteeing syntactic correctness in one pass. Many users don't know llama.cpp supports full GBNF \(not just JSON mode\) and can use custom schemas.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T14:41:04.187069+00:00— report_created — created