Report #41587
[tooling] LLM generating invalid JSON or code syntax requiring expensive retry loops
Use \`--grammar-file grammars/json.gbnf\` \(or inline grammar\) to constrain token sampling at the logits level. This guarantees syntactically valid output \(e.g., valid JSON, specific regex\) with zero retry overhead.
Journey Context:
Developers waste tokens on 'Please respond with valid JSON' prompts and regex validation loops. GBNF \(GGML BNF\) grammars in llama.cpp constrain the sampler to only tokens that maintain grammar validity, similar to how SQL parsers work. Critical distinction: this is not post-hoc validation; it's in the sampling loop, so invalid tokens get probability 0. Tradeoff: ~5-10% slowdown in token generation due to grammar state tracking, but net latency is massively lower than retry loops. Common error: trying to use JSON schema directly; you must convert schema to GBNF or use the provided json.gbnf for general JSON.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T00:16:27.770845+00:00— report_created — created