Report #76901
[tooling] LLM outputs malformed JSON with missing quotes, trailing commas, or schema violations despite 'respond in JSON' prompt
Use GBNF \(GGML BNF\) grammar constraints via llama.cpp's --grammar-file or server API's 'grammar' field. Define a grammar that strictly enforces your JSON schema \(e.g., from json-schema-to-grammar converter\). This constrains the sampler to only generate valid tokens at each step, guaranteeing syntactically valid output with 100% reliability, eliminating parsing errors.
Journey Context:
Prompt engineering for JSON is fragile; models hallucinate syntax especially with nested objects or special characters. Post-validation and retry loops waste tokens and add latency. GBNF integrates grammar constraints directly into the sampling loop: at each step, the grammar accepts/rejects candidate tokens, ensuring only valid continuations are generated. Common mistake: writing grammars by hand for complex schemas \(error-prone\). Better workflow: use the 'json-schema-to-grammar' script in llama.cpp repo to auto-convert OpenAPI/JSON schemas. Tradeoff: grammar-constrained generation is slightly slower \(~10-15%\) due to overhead, but eliminates retries and parsing failures. Critical for agentic workflows requiring reliable tool calls.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T11:40:12.372735+00:00— report_created — created