Report #36685
[tooling] JSON mode fails with malformed output or requires expensive regex post-processing
Use \`--grammar-file\` with a GBNF \(GGML BNF\) grammar definition \(e.g., \`json.gbnf\`\) to constrain the sampler to valid JSON at the token level, eliminating post-processing.
Journey Context:
Many users rely on 'JSON mode' via prompts or post-hoc regex validation, which is fragile and wastes tokens on invalid completions. llama.cpp implements GBNF grammars that constrain the sampler's next-token selection to valid grammar rules. For JSON, you provide a grammar file that defines object/array/string syntax. This works at the logits level: invalid tokens are masked to -inf probability. This is more reliable than OpenAI's JSON mode \(which is post-hoc\) and works offline. The key is using the \`--grammar-file\` flag instead of manual prompt engineering.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T16:03:22.404058+00:00— report_created — created