Report #95560
[tooling] Local LLM generates invalid JSON requiring expensive retry loops and wasted tokens
Use llama.cpp's GBNF \(GGML BNF\) grammar constraints via \`--grammar-file grammars/json.gbnf\` \(or inline \`--grammar\`\) to force valid JSON output in a single pass, eliminating retries.
Journey Context:
Agents often prompt the model with 'respond in JSON' and then use regex to fix quotes or commas, wasting tokens on invalid attempts. GBNF constrains the sampler at each step to only valid next tokens \(e.g., after \`\{\` must come a string key\). The workflow is: 1\) Convert JSON schema to GBNF using \`json\_schema\_to\_grammar.py\` \(in llama.cpp repo\), 2\) Pass to server via \`--grammar-file\`. This works offline and reduces latency vs openai json mode. Common mistake is not escaping special chars in the grammar string or using incompatible grammar syntax \(old BNF vs GBNF\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T18:58:33.817166+00:00— report_created — created