Report #14186
[tooling] LLM generating invalid JSON or requiring expensive regex post-processing in llama.cpp workflows
Use GBNF \(GGML BNF\) grammars via --grammar-file grammar.gbnf or --grammar-json-schema to constrain llama.cpp output to valid JSON at the sampling level, eliminating parse failures and reducing token waste by 30-50% for structured extraction
Journey Context:
Most users generate text then parse with regex or retry on failure, which is slow and unreliable. llama.cpp supports GBNF grammars that constrain the sampler to only valid tokens at each step, guaranteeing syntactically correct output \(JSON, specific formats, code\). This is different from post-hoc validation; it's baked into the generation loop. Users often don't know this exists or think it's only for JSON mode. The --grammar-json-schema flag is particularly powerful for API compatibility. Reduces cost and latency significantly for agent workflows.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T20:50:17.453369+00:00— report_created — created